# Data generation

### Synthesizing data into a Json file format.

In this notebook, I have generated some random data for student profiles.  

### Sample data format

Our document data format will look like below:

```json
{
    "name": "John Doe",
    "age": 30,
    "city": "New York",
    "email": "johndoe@example.com",
    "is_student": False,
    "hobbies": ["reading", "hiking", "cooking"],
    "scores": {
        "math": 95,
        "english": 88,
        "history": 76
    }
}
```

In [10]:
import json
import random

# List of candidate names
candidate_names = ["Alice Smith", "Bob Johnson", "Charlie Brown", "David Lee", "Eva Davis", "Frank Wilson", "Grace Turner", "Henry Harris", "Ivy Robinson", "Jack Martinez"]

# List of cities
cities = ["New York", "Los Angeles", "Chicago", "Houston", "Miami", "San Francisco", "Dallas", "Atlanta", "Seattle", "Boston"]

# List of hobbies
hobbies = ["Reading", "Hiking", "Cooking", "Photography", "Painting", "Swimming", "Gardening", "Traveling", "Playing Music", "Dancing"]

# Create a list to store the dataset rows
dataset = []

# Generate sample data for 5 rows
for _ in range(10):
    data = {
        "name": random.choice(candidate_names),
        "age": random.randint(20, 40),  # Random age between 20 and 40
        "city": random.choice(cities),
        "email": f"{random.choice(candidate_names).replace(' ', '.').lower()}@example.com",  # Generate a random email
        "is_student": random.choice([True, False]),
        "hobbies": random.sample(hobbies, random.randint(1, len(hobbies))),  # Random selection of hobbies
        "scores": {
            "math": random.randint(60, 100),      # Random math score between 60 and 100
            "english": random.randint(60, 100),   # Random English score between 60 and 100
            "history": random.randint(60, 100)    # Random history score between 60 and 100
        }
    }
    dataset.append(data)

# Serialize the dataset to a JSON string
json_data = json.dumps(dataset, indent=4)

# Print the JSON data
print(json_data)

# Save the JSON data to a local file
with open("./data/dataset.json", "w") as json_file:
    json.dump(dataset, json_file, indent=4)

print("Dataset saved to 'dataset.json'")


[
    {
        "name": "David Lee",
        "age": 29,
        "city": "Chicago",
        "email": "bob.johnson@example.com",
        "is_student": false,
        "hobbies": [
            "Cooking",
            "Photography",
            "Swimming",
            "Dancing",
            "Playing Music",
            "Painting",
            "Traveling",
            "Hiking",
            "Reading"
        ],
        "scores": {
            "math": 93,
            "english": 62,
            "history": 79
        }
    },
    {
        "name": "David Lee",
        "age": 38,
        "city": "Houston",
        "email": "bob.johnson@example.com",
        "is_student": true,
        "hobbies": [
            "Playing Music",
            "Painting"
        ],
        "scores": {
            "math": 73,
            "english": 62,
            "history": 68
        }
    },
    {
        "name": "David Lee",
        "age": 38,
        "city": "Seattle",
        "email": "bob.johnson@example.com",
   

### Conclusion:

Data is successfully generated  for 10 random student profiles.