# Strategy in Finding a Business Location

To find the best business location, start by analyzing your target market, researching potential areas, and considering factors like accessibility, infrastructure, and zoning regulations.
Here's a more detailed strategy:  

##  1. Define Your Needs and Goals: 
- **Target Market:** Identify your ideal customer base and where they are located
    - For definition see [this](https://www.investopedia.com/terms/t/target-market.asp#:~:text=Demographic%3A%20These%20are%20the%20main,in%20the%20era%20of%20globalization.)
- **Business Type:** Determine if your business requires 
    - high foot traffic
    - proximity to suppliers
    - access to specific infrastructure (e.g., highways, rail yards)
- **Budget:** Establish a realistic budget for 
    - rent
    - utilities
    - other location-related costs

### Target Market Analysis

In [None]:
# Example: Laundromat
laundromat_target_market = {
    "description": "Laundromats serve individuals and families who lack in-unit laundry facilities, offering self-service washing and drying options. They cater to a diverse clientele, including renters, students, and busy professionals.",
    "demographics": {
        "age_range": "18–55", 
        "gender_distribution": {"female": 60, "male": 40},
        "household_income": {"median": 28_000, "range": "15,000–60,000"},
        "primary_residence": ["renters", "apartment dwellers", "students", "dual-income households", "middle-aged individuals", ""],
        "interpetation_and_strategic_insight": "The laundromat clientele predominantly comprises renters and apartment dwellers, with a significant portion being students and dual-income households. Understanding this demographic is crucial for tailoring services and marketing efforts."
    },
    "usage_patterns": {
        "frequency": {"weekly": 60, "weekend_usage": 70},
        "average_spend_per_visit": {"low_end": 10, "high_end": 20},
        "average_time_spent_per_visit": {"min": 60, "max": 90},
        "interpetation_and_strategic_insight": "Most customers visit laundromats weekly, with peak usage occurring on weekends, particularly Saturdays. Offering promotions or extended hours during these times can attract more customers."
    },
    "kpis_and_metrics_to_track": {
        "operational_efficiency_metrics": {
            "machine_utilization_rate": {
            "description": "Measures the percentage of time machines are in use versus their total available time.",
            "calculated_metric": "machine_downtime_per_load"
            },
            "utility_costs_per_load": {
            "description": "Evaluates the utility cost (water, electricity, gas) per load of laundry.",
            "calculated_metric": "cost_savings_from_efficiency_measures"
            },
            "machine_downtime_percentage": {
            "description": "Measures the percentage of time machines are inactive due to issues like maintenance or repair.",
            "calculated_metric": "revenue_loss_due_to_downtime"
            }
        },
        "financial_performance_metrics": {
            "gross_income_and_profitability": {
            "description": "Monitors total revenue and net profit after expenses to assess financial health.",
            "calculated_metric": "profitability_ratio"
            },
            "average_order_value": {
            "description": "Tracks the average amount spent per customer transaction.",
            "calculated_metric": "aov_growth_rate"
            },
            "cost_per_pound_of_laundry": {
            "description": "Measures average cost to process each pound of laundry.",
            "calculated_metric": "cost_efficiency_index"
            },
            "profit_margin": {
            "description": "Indicates what percentage of revenue remains after all expenses.",
            "calculated_metric": "margin_improvement_percentage"
            }
        },
        "customer_satisfaction_and_retention_metrics": {
            "customer_retention_rate": {
            "description": "Measures the percentage of customers who return over a given time period."
            },
            "net_promoter_score": {
            "description": "Assesses customer loyalty based on likelihood to recommend the laundromat."
            },
            "customer_feedback_and_complaints": {
            "description": "Tracks feedback and complaints to identify customer needs and improve service quality."
            }
        }
    },
    "location_proximity": {
        "within_one_mile": 87
    },
    "revenue_statistics": {
        "average_revenue_per_laundromat": 142_000,
        "average_revenue_per_machine": 1_500,
        "profit_margin": {"estimated_low": 15, "estimated_high": 30},
        "startup_cost_range": {"low": 200_000, "high": 500_000}
    },
    "sources": [
        {"name": "Skyrocket BPO Laundromat Industry Analysis", 
         "url": "https://www.skyrocketbpo.com/laundromat-industry-analysis"},
         
        {"name": "Crucial Stats Every Laundromat Owner Must Know Before Starting or Investing", 
         "url": "https://www.turnsapp.com/blog/key-statistics-every-laundromat-owner-should-know-before-starting-or-investing"},

        {"name": "How Laundromats Combat Period Poverty and Period Stigma", 
         "url": "https://endometriosis.net/living/laundromats"},

        {"name": "Essential Laundromat Equipment for Starting Your Business", 
         "url": "https://metrobi.com/blog/essential-laundromat-equipment-for-your-business"},

        {"name": "KPI to keep track of", 
         "url": "https://www.flexwasher.com/laundromat-kpis-and-metrics"}
    ],
    "anecdote": [
        {"name": "We bought a laundromat and its all about the numbers", 
         "url": "https://laundromats101.com/2019/01/we-bought-a-laundromat-and-its-all-about-the-numbers"}
    ]
}

## Core Business Type Considerations

- **High Foot Traffic**:  
  Required — areas near apartment complexes, universities, and public transport hubs are ideal.

- **Proximity to Suppliers**:  
  Moderately Important — access to maintenance services and detergent vendors is helpful but not critical.

- **Access to Infrastructure (e.g., highways, rail yards)**:  
  Not Required — but local road access and available parking are very important for customer convenience.


## Budget
- Here’s the full laundromat startup and operating budget for Pennsylvania, formatted as a detailed Python dictionary

In [None]:
laundromat_budget_pa = {
    "startup_costs": {
        "lease_deposit": {
            "estimated": [10000, 30000],
            "description": "Typically 3–6 months' rent upfront."
        },
        "commercial_equipment": {
            "estimated": [40000, 260000],
            "description": "Includes washers, dryers, folding tables, POS systems."
        },
        "renovations_buildout": {
            "estimated": [20000, 50000],
            "description": "Plumbing, electrical, interior layout, flooring."
        },
        "licenses_permits": {
            "estimated": [500, 2000],
            "description": "Business license, health inspections, local permits."
        },
        "marketing_signage": {
            "estimated": [2000, 5000],
            "description": "Initial promotional material, grand opening, branding."
        },
        "insurance": {
            "estimated_annual": [2000, 3000],
            "description": "Property, liability, workers’ compensation."
        },
        "contingency_fund": {
            "estimated_percent_of_total_investment": [5, 10],
            "description": "Reserved for unexpected expenses."
        },
        "total_startup_estimate": [101000, 420000]
    },
    "monthly_operating_costs": {
        "lease_or_mortgage": {
            "estimated": [5000, 10000],
            "description": "Rental or financing costs for commercial space."
        },
        "utilities": {
            "estimated": [4500, 9000],
            "description": "Water, electricity, gas — laundromats are utility-heavy."
        },
        "employee_wages": {
            "estimated": [2000, 3500],
            "description": "Attendants, shift managers, etc."
        },
        "maintenance_repairs": {
            "estimated": [500, 5000],
            "description": "Regular and emergency servicing of machines and infrastructure."
        },
        "supplies_inventory": {
            "estimated": [1000, 2000],
            "description": "Detergents, vending machine items, cleaning products."
        },
        "insurance": {
            "monthly_equivalent": [125, 250],
            "description": "Liability and property insurance (monthly portion)."
        },
        "total_monthly_estimate": [13500, 29500]
    },
    "additional_info": {
        "revenue_potential": {
            "annual": [30000, 300000],
            "profit_margin_percent": [20, 35]
        },
        "financing_options": [
            "SBA loans",
            "Equipment financing",
            "Business line of credit"
        ],
        "location_factors": [
            "Higher rent in urban areas balanced by higher foot traffic and customer volume"
        ]
    },
    "sources": [
        {
            "name": "Upmetrics",
            "url": "https://upmetrics.co/startup-costs/laundromat"
        },
        {
            "name": "Consulting Times",
            "url": "https://consultingtimes.com/how-much-does-it-cost-to-start-a-laundromat-business"
        },
        {
            "name": "The Pricer",
            "url": "https://www.thepricer.org/cost-to-start-a-laundromat"
        },
    ]
}

## Conclusion

### 1. Demographic Compatibility
- Target neighborhoods with:
  - High concentration of renters
  - Low-to-moderate income households
  - Students, seniors, or young professionals
- Use tools like **U.S. Census Data** or **local property databases**.

### 2. Parking and Accessibility
- Ensure **ample off-street parking**
- **ADA compliance** (ramps, wide entryways) is both legally and commercially wise

### 3. Visibility & Signage
- Ideal locations:
  - Street-facing
  - Corner lots
  - High pedestrian zones
- Invest in **well-lit, branded signage**

### 4. Competitor Proximity
- Avoid clustering unless offering **superior services**:
  - Wash-and-fold
  - Free Wi-Fi
  - Loyalty programs
- Check **Google Maps** or **Yelp** for competitor density

### 5. Safety & Security
- Choose well-lit, low-crime areas
- Install visible **security cameras** to promote customer trust

### 6. Zoning and Utilities
- Confirm **zoning allows commercial laundry services**
- Ensure access to:
  - **3-phase power**
  - **High water pressure**
  - **Proper drainage systems**

### 7. Neighboring Businesses
- Favor adjacency to:
  - Coffee shops
  - Takeout restaurants
  - Convenience stores  
These increase **customer dwell time** and cross-business synergy.

### 8. Budget Considerations
- Total startup costs ranging from **<span>&#36;</span>101,000 to <span>&#36;</span>420,000**, primarily driven by equipment and renovation expenses. 
- Monthly operating costs fall between **<span>&#36;</span>13,500 and <span>&#36;</span>29,500**, reflecting the utility-intensive and service-oriented nature of the business. 
- While the annual revenue potential varies widely—from **<span>&#36;</span>30,000 to <span>&#36;</span>300,000**
- Profit margins of **20% to 35%** suggest a solid opportunity for profitability, especially in high-traffic urban locations. 
- Strategic financing and thorough planning are essential to manage costs and maximize returns.


## 2. Research Potential Locations:
- **Demographics:** Research the demographics of potential areas to ensure they align with your target market
- **Competition:** Analyze the competitive landscape in each area to understand the level of competition
- **Traffic:** Consider traffic patterns and parking availability, especially if foot traffic is important
- **Psychographic Information (optional):** Go beyond basic demographics to explore customer lifestyles, values, and habits. 
    - For instance, are there large groups of environmentally conscious consumers in your area who might appreciate eco-friendly laundry solutions?

### Locate & Analyze Customers and Market with **Census Business Builder**
- [Video How-To](https://www.census.gov/data/academy/data-gems/2023/locate-analyze-customers-market-with-cbb.html)
- [Census Business Builder](https://cbb.census.gov/cbb/)

### Begin Research on Businesses, Markets, Demographics etc.

In [None]:
# Import modules
import pandas as pd
import os
import json
from us import states
import requests
import zipcodes
import addfips
from sklearn.preprocessing import MinMaxScaler
from IPython.display import display

# Created modules
from ipython_config import CENSUS_KEY

In [None]:
# Coin-Operated Laundries and Drycleaners
NAICS = '812310'

# Vandergrift, PA
ZIPCODE = '15690'

state = states.PA
STATEFIPS = state.fips
STATENAME = state.name

result = zipcodes.matching(ZIPCODE)[0]
COUNTYNAME = result['county']

af = addfips.AddFIPS()

# Get FIPS code for a single county
COUNTYFIP = af.get_county_fips(county=COUNTYNAME, state=STATENAME)
COUNTYCODE = COUNTYFIP[2:]

print(f"""
Initial:
NAICS: {NAICS}
ZIPCODE: {ZIPCODE}
STATEFIPS: {STATEFIPS}
STATENAME: {STATENAME}
COUNTYNAME: {COUNTYNAME} 
COUNTYFIP: {COUNTYFIP}  
COUNTYCODE: {COUNTYCODE}
""")
print("Secondary:")
for key, value in result.items():
    print(f'{key.replace('_', '').upper()}: {value}')

## API's For Business Statistics
- Nonemployer Statistics
- Business Patterns County Business Patterns
- Economic Census
- American Community Survey 1-Year Data (2005-2023)
- American Community Survey 5-Year Data (2009-2023)

In [None]:
def check_file_exist(url):
    filename = '_'.join(url.split('/')[5:])
    file_path_os = f'data/{filename}'

    res = {'exist': False,
            'file_path': file_path_os}
    
    if os.path.exists(file_path_os):
        res['exist'] = True

    return res


def download_census_variables(url):
    """
    Downloads Census variable metadata from a given URL and saves it as a JSON file.

    Parameters:
    - url (str): The URL to fetch the JSON data from.
    - output_path (str): Local path where the JSON file will be saved.

    Raises:
    - Exception: If the request fails or file cannot be saved.
    """

    filename = check_file_exist(url)
    filename_exist = filename['exist']
    filename_path = filename['file_path']

    try:
        if filename_exist:
            with open(filename_path, 'r') as f:
                data = json.load(f)
            print(f"{filename_path} exists and successfully downloaded")
        else:
            data = requests.get(url)
            data.raise_for_status()
            data = data.json()

            print(f"Successfully downloaded {url}")
            with open(filename_path, 'w') as f:
                json.dump(data, f)  
            print(f"Successfully saved {url} to {filename_path}")
      
        return data

    except requests.exceptions.HTTPError as http_err:
        print(f"HTTP error occurred: {http_err}")
    except requests.exceptions.RequestException as req_err:
        print(f"Request error: {req_err}")
    except OSError as file_err:
        print(f"File error: {file_err}")
    except Exception as err:
        print(f"An unexpected error occurred: {err}")


def create_data_dictionary(url, relevant_variables):
    data = download_census_variables(url)
    variables_df = pd.DataFrame(data['variables']).T.reset_index()

    relevant_cols = relevant_variables.split(',')

    filtered_df = variables_df[variables_df['index'].isin(relevant_cols)][['index', 'label', 'concept']]

    for _, row in filtered_df.iterrows():
        var_name = row['index']
        description = f"{row['label']} {row['concept']}"
        print(f"'{var_name}': '{description}',")


def search_variables_in_data_dictionary(url, word):
    word = word.lower()
    data = download_census_variables(url)
    variables_df = pd.DataFrame(data['variables']).T.reset_index().astype(str)

    variables_df['label_concept'] = (variables_df['label']+' '+variables_df['concept']).str.lower()

    filtered_df = variables_df[variables_df['label_concept'].str.contains(word)][['index','label', 'concept']]

    for _, row in filtered_df.iterrows():
        var_name = row['index']
        description = f"{row['label']} {row['concept']}"
        print(f"'{var_name}': '{description}',")


In [None]:
# Test
url = "https://api.census.gov/data/2023/acs/acs5/variables.json"
relevant_variables = ','.join(['C17002_002E', 'C17002_003E', 'B01003_001E'])
word = 'one race'

create_data_dictionary(url, relevant_variables)
search_variables_in_data_dictionary(url, word)


In [None]:
base_url = "https://api.census.gov/data"

# Nonemployer Statistics
nonemp_params = {
    'variables' : "",
    'geography': f"county:{COUNTYCODE}&in=state:{STATEFIPS}",
    'api_key': CENSUS_KEY,
    'dataset_base': 'nonemp',
    'year': ''
}

# Business Patterns County Business Patterns
cbp_params = {
    'variables' : "",
    'geography': f"county:{COUNTYCODE}&in=state:{STATEFIPS}",
    'api_key': CENSUS_KEY,
    'dataset_base': 'cbp',
    'year': ''
}

# Economic Census
ecnbasic_params = {
    'variables' : "",
    'geography': f"county:{COUNTYCODE}&in=state:{STATEFIPS}",
    'api_key': CENSUS_KEY,
    'dataset_base': 'ecnbasic',
    'year': ''
}

# ACS 1-Year Estimates
acs1_params = {
    'variables' : "",
    'geography': f"county:{COUNTYCODE}&in=state:{STATEFIPS}",
    'api_key': CENSUS_KEY,
    'dataset_base': 'acs/acs1',
    'year': ''
}

# ACS 5-Year Estimates
acs5_params = {
    'variables' : "",
    'geography': f"tract:*&in=state:{STATEFIPS}&in=county:{COUNTYCODE}",
    'api_key': CENSUS_KEY,
    'dataset_base': 'acs/acs5',
    'year': ''
}

In [None]:
def get_data(abs_params):
    """
    Retrieve data from the U.S. Census Bureau API.

    Parameters:
    - abs_params (dict): Dictionary with the following keys:
        - 'year': (str) Year of the dataset (e.g., "2020")
        - 'dataset_base': (str) Dataset name (e.g., "acs/acs5")
        - 'variables': (str) Comma-separated variable names
        - 'geography': (str) Geographic level (e.g., "us:1")
        - 'api_key': (str) Census Bureau API key (optional)

    Returns:
    - list: Parsed JSON data from the API.

    Raises:
    - Exception: If request fails or required parameters are missing.
    """
    base_url = "https://api.census.gov/data"

    try:
        year = abs_params['year']
        dataset = abs_params['dataset_base']
        variables = abs_params['variables']
        geography = abs_params['geography']
        api_key = abs_params.get('api_key', '')

        url = f"{base_url}/{year}/{dataset}?get={variables}&for={geography}"
        if api_key:
            url += f"&key={api_key}"

        # Safe debug print without exposing API key
        print(f"Requesting Census data for {year}, dataset: {dataset}, geography: {geography}")

        response = requests.get(url)
        response.raise_for_status()
        return response.json()

    except KeyError as e:
        raise KeyError(f"Missing required parameter: {e}")
    except requests.RequestException as e:
        raise Exception(f"API request failed: {e}")


def get_all_business_code_table(df, naics_code, naics_column):
    """
    Filters a DataFrame to include rows matching all prefix levels of a given NAICS code.

    Parameters:
    - df (pd.DataFrame): DataFrame containing NAICS data.
    - naics_code (str): Full NAICS code to extract prefix matches for.
    - naics_column (str): Name of the column in `df` containing NAICS codes.

    Returns:
    - pd.DataFrame: Filtered DataFrame with all matching NAICS code prefixes.
    """
    full_list = []
    for i in range(1, len(naics_code) + 1):
        naics_prefix = naics_code[:i]
        matching_rows = df[df[naics_column] == naics_prefix]
        if not matching_rows.empty:
            full_list.append(matching_rows)

    if full_list:
        return pd.concat(full_list, ignore_index=True)
    else:
        return pd.DataFrame(columns=df.columns)
    
    
def combine_and_rename_naics(df_list):
    """
    Combines a list of DataFrames and standardizes NAICS columns.
    Handles missing year-specific NAICS columns gracefully.
    
    Parameters:
        df_list (list of pd.DataFrame): List of DataFrames to combine.
        
    Returns:
        pd.DataFrame: Cleaned and combined DataFrame.
    """
    df = pd.concat(df_list, ignore_index=True)

    # Safely retrieve NAICS columns if they exist
    naics_cols = ['NAICS2022', 'NAICS2017', 'NAICS2012']
    naics_label_cols = ['NAICS2022_LABEL', 'NAICS2017_LABEL', 'NAICS2012_LABEL']

    # Combine available NAICS codes
    df['NAICS'] = None
    for col in naics_cols:
        if col in df.columns:
            df['NAICS'] = df['NAICS'].fillna(df[col])

    # Combine available NAICS labels
    df['NAICS_LABEL'] = None
    for col in naics_label_cols:
        if col in df.columns:
            df['NAICS_LABEL'] = df['NAICS_LABEL'].fillna(df[col])

    # Drop original columns that exist
    cols_to_drop = naics_cols + naics_label_cols
    df.drop(columns=[col for col in cols_to_drop if col in df.columns], inplace=True)

    return df


### Gather Nonemployer Statistics (2012 - 2022)

In [None]:
nonemp_file_path = f'data/{NAICS}_{COUNTYCODE}_{STATEFIPS}_{nonemp_params['dataset_base']}.csv'
if os.path.exists(nonemp_file_path):
    nonemp = pd.read_csv(nonemp_file_path)
else: 
    nonemp_all_years = []
    for i in range(2012, 2026):
        if i >= 2022:
            naics_year = 2022 
        elif i >= 2017:
            naics_year = 2017
        else:
            naics_year = 2012

        nonemp_params['year'] = i
        nonemp_params['variables'] = f'NAME,COUNTY,NAICS{naics_year},NAICS{naics_year}_LABEL,NESTAB,NRCPTOT,YEAR'

        try:
            nonemp = pd.DataFrame(get_data(nonemp_params))
            nonemp.columns = nonemp.iloc[0]
            nonemp = nonemp.iloc[1:]
            naics_nonemp = get_all_business_code_table(nonemp, NAICS, f'NAICS{naics_year}')
            nonemp_all_years.append(naics_nonemp)
            
            print(f'finished year {i}')
            
        except:
            print(f'failed year {i}')
    
    nonemp = combine_and_rename_naics(nonemp_all_years)
    nonemp.to_csv(nonemp_file_path, index=False)

print(nonemp.info())
nonemp.head()

In [None]:
create_data_dictionary(f"{base_url}/2022/{nonemp_params['dataset_base']}/variables.json", nonemp_params['variables'])

### Gather Business Patterns County Business Patterns (2012 - 2022)

In [None]:
cbp_file_path = f'data/{NAICS}_{COUNTYCODE}_{STATEFIPS}_{cbp_params['dataset_base']}.csv'
if os.path.exists(cbp_file_path):
    cbp = pd.read_csv(cbp_file_path)
else: 
    cbp_all_years = []
    for i in range(2012, 2026):
        if i >= 2022:
            naics_year = 2022 
        elif i >= 2017:
            naics_year = 2017
        else:
            naics_year = 2012
        cbp_params['year'] = i
        cbp_params['variables'] = f"NAME,COUNTY,EMP,ESTAB,NAICS{naics_year},NAICS{naics_year}_LABEL,PAYANN,PAYANN_N,PAYQTR1,PAYQTR1_N,YEAR"

        try:
            cbp = pd.DataFrame(get_data(cbp_params))
            cbp.columns = cbp.iloc[0]
            cbp = cbp.iloc[1:]
            naics_cbp = get_all_business_code_table(cbp, NAICS, f'NAICS{naics_year}')
            cbp_all_years.append(naics_cbp)

            print(f'finished year {i}')
        except:
            print(f'failed year {i}')
    cbp = combine_and_rename_naics(cbp_all_years)
    cbp.to_csv(cbp_file_path, index=False)

print(cbp.info())
cbp.head()

In [None]:
create_data_dictionary(f"{base_url}/2022/{cbp_params['dataset_base']}/variables.json", cbp_params['variables'])

### Gather Economic Census (2012 - 2021)

In [None]:
ecnbasic_file_path = f'data/{NAICS}_{COUNTYCODE}_{STATEFIPS}_{ecnbasic_params['dataset_base']}.csv'
if os.path.exists(ecnbasic_file_path):
    ecnbasic = pd.read_csv(ecnbasic_file_path)
else: 
    ecnbasic_all_years = []
    for i in range(2012, 2026):
        if i >= 2022:
            naics_year = 2022 
        elif i >= 2017:
            naics_year = 2017
        else:
            naics_year = 2012
        ecnbasic_params['year'] = i
        ecnbasic_params['variables'] = f"NAME,CBSA,COUNTY,CSA,EMP,ESTAB,NAICS{naics_year},NAICS{naics_year}_LABEL,PAYANN,PAYQTR1,YEAR"
        try:
            ecnbasic = pd.DataFrame(get_data(ecnbasic_params))
            ecnbasic.columns = ecnbasic.iloc[0]
            ecnbasic = ecnbasic.iloc[1:]

            naics_ecnbasic = get_all_business_code_table(ecnbasic, NAICS, f'NAICS{naics_year}')
            
            ecnbasic_all_years.append(naics_ecnbasic)
            print(f'finished year {i}')
        except:
            print(f'failed year {i}')
    ecnbasic = combine_and_rename_naics(ecnbasic_all_years)
    ecnbasic.to_csv(ecnbasic_file_path, index=False)

print(ecnbasic.info())
ecnbasic.head()

In [None]:
create_data_dictionary(f"{base_url}/2022/{ecnbasic_params['dataset_base']}/variables.json", ecnbasic_params['variables'])

### Gather American Community Survey 1-Year Data (2012 - 2023)

In [None]:
acs1_file_path = f'data/{NAICS}_{COUNTYCODE}_{STATEFIPS}_{acs1_params['dataset_base']}.csv'
if os.path.exists(acs1_file_path):
    acs1 = pd.read_csv(acs1_file_path)
else: 
    acs1_all_years = []
    for i in range(2012, 2026):
        acs1_params['year'] = i
        acs1_params['variables'] = "NAME,B19013_001E,B01003_001E"
        try:
            acs1 = pd.DataFrame(get_data(acs1_params))
            acs1.columns = acs1.iloc[0]
            acs1 = acs1.iloc[1:]     
            acs1['year'] = i   
            acs1_all_years.append(acs1)

            print(f'finished year {i}')
        except:
            print(f'failed year {i}')
    acs1 = pd.concat(acs1_all_years, ignore_index=True)
    acs1.to_csv(acs1_file_path, index=False)

print(acs1.info())
acs1.head()

In [None]:
create_data_dictionary(f"{base_url}/2022/{acs1_params['dataset_base']}/variables.json", acs1_params['variables'])

In [None]:
acs1_cols_to_rename = {
    'B19013_001E': 'Median_Household_Income',
    'B01003_001E': 'Population'
}

new_acs1_cols = list(acs1_cols_to_rename.values())
acs1.rename(columns=acs1_cols_to_rename, inplace=True)


In [None]:
acs1[new_acs1_cols] = acs1[new_acs1_cols].astype(float)

# ESTIMATE: Use a flat 20% tax rate
acs1["Estimated_Taxes"] = acs1["Median_Household_Income"] * 0.20
acs1["Disposable_Income"] = acs1["Median_Household_Income"] - acs1["Estimated_Taxes"]

# Population growth = (New - Old) / Old * 100
acs1.sort_values(by='year', inplace=True)
acs1['Pop_Growth'] = acs1['Population'].diff().fillna(0)
acs1['Pop_Growth_Rate'] = (acs1['Pop_Growth'] / acs1['Population'] * 100).round(2).astype(str) + ' %'
acs1

### Gather American Community Survey 5-Year Data (2012 - 2023)

In [None]:
acs5_file_path = f'data/{NAICS}_{COUNTYCODE}_{STATEFIPS}_{acs5_params['dataset_base']}.csv'
if os.path.exists(acs5_file_path):
    acs5 = pd.read_csv(acs5_file_path)
else: 
    acs5_all_years = []
    for i in range(2012, 2026):    
        acs5_params['year'] = i
        acs5_params['variables'] = "NAME,C17002_002E,C17002_003E,B01003_001E,B19013_001E,B01002_001E,B15003_022E,B15003_023E,B15003_024E,B15003_025E,B01001_011E,B01001_012E,B01001_035E,B01001_036E,B19083_001E,C02003_008E,C02003_004E,C02003_003E,C02003_007E,C02003_006E,C02003_005E" 
        
        try:
            acs5 = pd.DataFrame(get_data(acs5_params))
            acs5.columns = acs5.iloc[0]
            acs5 = acs5.iloc[1:]     
            acs5['year'] = i    
            acs5_all_years.append(acs5)

            print(f'finished year {i}')
        except:
            print(f'failed year {i}')

    acs5 = pd.concat(acs5_all_years)
    acs5.to_csv(acs1_file_path, index=False)

print(acs5.info())
acs5.head()

In [None]:
create_data_dictionary(f"{base_url}/2022/{acs5_params['dataset_base']}/variables.json", acs5_params['variables'])

In [None]:
search_variables_in_data_dictionary(f"{base_url}/2022/{acs5_params['dataset_base']}/variables.json", "years Sex by Age")

In [None]:
print(acs5.info())

In [None]:
acs5_cols_to_rename = {
    'B01003_001E': 'Population',
    'B19013_001E': 'Median_Income',
    'B01002_001E': 'Median_Age',
    'B15003_022E': 'Bachelors',
    'B15003_023E': 'Masters',
    'B15003_024E': 'Professional',
    'B15003_025E': 'Doctorate',
    'B01001_011E': 'M_25_29', 
    'B01001_012E': 'M_30_34',
    'B01001_035E': 'F_25_29', 
    'B01001_036E': 'F_30_34',
    'B19083_001E': 'Gini_Index',
    'C02003_008E': 'Other',
    'C02003_004E': 'Black',
    'C02003_003E': 'White',
    'C02003_007E': 'Native_Hawaiian_and_Other_Pacific_Islander',
    'C02003_006E': 'Asian',
    'C02003_005E': 'American_Indian_and_Alaska_Native',
}

In [None]:
# Rename columns for clarity
acs5.rename(columns=acs5_cols_to_rename, inplace=True)

In [None]:
new_acs5_cols = list(acs5_cols_to_rename.values())
acs5[new_acs5_cols] = acs5[new_acs5_cols].astype(float)

In [None]:
# Convert numerical columns
acs5['Edu_High'] = acs5['Bachelors'] + acs5['Masters'] + acs5['Professional'] + acs5['Doctorate']
acs5['Edu_Rate'] = (acs5['Edu_High'] / acs5['Population']) * 100

# Example: % of people aged 25–44 (prime working/spending age)
acs5['Age_25_34'] = acs5['M_25_29'] + acs5['M_30_34'] + acs5['F_25_29'] + acs5['F_30_34']
acs5['Age_25_34_Pct'] = (acs5['Age_25_34'] / acs5['Population']) * 100

# Simpson's Diversity Index example:
ethnic_cols = ['Other', 'Black', 'White', 
               'Native_Hawaiian_and_Other_Pacific_Islander', 
               'Asian', 'American_Indian_and_Alaska_Native']

ethnic_shares = acs5[ethnic_cols].div(acs5['Population'], axis=0)
acs5['Diversity_Index'] = 1 - (ethnic_shares ** 2).sum(axis=1)

# Normalize scores
scaler = MinMaxScaler()
features = ['Population', 'Median_Income', 'Edu_Rate', 'Diversity_Index', 'Age_25_34_Pct']
acs5_scaled = pd.DataFrame(scaler.fit_transform(acs5[features]), columns=features)

# Weighted score
acs5['Score'] = (0.25 * acs5_scaled['Population'] +
                0.25 * acs5_scaled['Median_Income'] +
                0.20 * acs5_scaled['Edu_Rate'] +
                0.15 * acs5_scaled['Diversity_Index'] +
                0.15 * acs5_scaled['Age_25_34_Pct'])

# Caculate poverty rate
acs5[['C17002_002E', 'C17002_003E']] = acs5[['C17002_002E', 'C17002_003E']].astype(float)
acs5["Poverty_Rate"] = (acs5["C17002_002E"] + acs5["C17002_003E"]) / acs5["Population"] * 100

# Sort by Median Income (for example)
top_tracts = acs5[acs5['year'] == 2023].sort_values(by='Median_Income', ascending=False).head(10)
print('display top tracts in a county with highest median income')
display(top_tracts[['NAME', 'Population', 'Median_Income', 'Gini_Index', 'Edu_High', 'Edu_Rate', 'Median_Age', 'Age_25_34_Pct', 'Score', 'year']])


In [None]:
# TODO: Helps identify competition or opportunity.
# acs5_all_years['Business_Density'] = acs5_all_years['Number_of_Businesses'] / acs5_all_years['Population'] * 1000

### Optional Next Steps:
- Use geopandas to map this data geographically.
- Add business data using sources like Yelp API or Google Places API for competitor analysis.
- Use clustering (sklearn) to group similar counties/tracts.

## 3. Evaluate and Compare:  
- **Visit Potential Locations:** Conduct site visits to assess the suitability of each location firsthand
- **Cost Analysis:** Compare the costs of different 
    - locations
    - including rent
    - utilities
    - taxes  
- **Infrastructure:** Assess the availability of essential infrastructure, such as 
    - transportation
    - utilities
    - internet access
- **Pros and Cons:** Create a list of pros and cons for each potential location to help make a decision
- **Long-Term Growth Potential:** Consider the long-term growth potential of the area and its ability to support your business  

## 4. Consider Legal and Institutional Factors: 
- **Business Licenses:** Research the necessary business licenses and permits required for your business type and location. 
- **Local Regulations:** Understand any local regulations or restrictions that may affect your business operations.
- **Government Incentives:** Explore any local or state government incentives or programs that may be available for businesses in specific areas.
- **Zoning Laws:** Understand local zoning regulations and restrictions to ensure compliance