# Tutorial: Generating Base Population and Household Data

This tutorial will guide you through the process of generating base
population and household data for a specified region using census data.
We’ll use a `CensusDataLoader` class to handle the data
processing and generation.


## Step 1: Set Up File Paths

First, we need to specify the paths to our data files:

``` python
# Path to the population data file. Update with the actual file path.
POPULATION_DATA_PATH = "/path/to/population_data.pkl"

# Path to the household data file. Update with the actual file path.
HOUSEHOLD_DATA_PATH = "/path/to/household_data.pkl"
```

Make sure to replace the placeholder paths with the actual paths to your
data files.

## Step 2: Define Age Group Mapping

We’ll define a mapping for age groups to categorize adults and children
in the household data:

``` python
AGE_GROUP_MAPPING = {
    "adult_list": ["20t29", "30t39", "40t49", "50t64", "65A"],  # Age ranges for adults
    "children_list": ["U19"],  # Age range for children
}
```

## Step 3: Load Data

Now, let’s load the population and household data:

``` python
import numpy as np
import pandas as pd

# Load household data
HOUSEHOLD_DATA = pd.read_pickle(HOUSEHOLD_DATA_PATH)

# Load population data
BASE_POPULATION_DATA = pd.read_pickle(POPULATION_DATA_PATH)
```

## Step 4: Set Up Additional Parameters

We’ll set up some additional parameters that might be needed for data
processing:

``` python
# Placeholder for area selection criteria, if any. Update or use as needed.
area_selector = None

# Placeholder for geographic mapping data, if any. Update or use as needed.
geo_mapping = None
```

## Step 5: Initialize the Census Data Loader

Create an instance of the `CensusDataLoader` class:

``` python
census_data_loader = CensusDataLoader(n_cpu=8, use_parallel=True)
```

This initializes the loader with 8 CPUs and enables parallel processing
for faster data generation.

## Step 6: Generate Base Population Data

Generate the base population data for a specified region:

``` python
census_data_loader.generate_basepop(
    input_data=BASE_POPULATION_DATA,  # The population data frame
    region="astoria",  # The target region for generating base population
    area_selector=area_selector,  # Area selection criteria, if applicable
)
```

This will create a base population of 100 individuals for the “astoria”
region. The generated data will be exported to a folder named “astoria”
under the “populations” folder.

## Step 7: Generate Household Data

Finally, generate the household data for the specified region:

``` python
census_data_loader.generate_household(
    household_data=HOUSEHOLD_DATA,  # The loaded household data
    household_mapping=AGE_GROUP_MAPPING,  # Mapping of age groups for household composition
    region="astoria"  # The target region for generating households
)
```

This will create household data for the “astoria” region based on the
previously generated base population. The generated data will be
exported to the same “astoria” folder under the “populations” folder.

## Bonus: Generate Population Data of Specific Size

For quick experimentation, this may come in handy.

``` python
census_data_loader.generate_basepop(
    input_data=BASE_POPULATION_DATA,  # The population data frame
    region="astoria",  # The target region for generating base population
    area_selector=area_selector,  # Area selection criteria, if applicable
    num_individuals = 100 # Saves data for first 100 individuals, from the generated population
)
```

## Bonus: Export Population Data

If you have already generated your synthetic population, you just need to export it to "populations" folder under the desired "region", in order for you to use it with AgentTorch.

``` python
POPULATION_DATA_PATH = "/population_data.pickle"  # Replace with actual path
census_data_loader.export(population_data_path=POPULATION_DATA_PATH,region="astoria")
```

In case you want to export data for only few individuals

``` python
census_data_loader.export(population_data_path=POPULATION_DATA_PATH,region="astoria",num_individuals = 100)
```

## Conclusion

You have now successfully generated both base population and household
data for the “astoria” region. The generated data can be found in the
“populations/astoria” folder. You can modify the region name, population
size, and other parameters to generate data for different scenarios.
