# U.S. Medical Insurance Costs

## Project Goals
1. Demonstrate knowledge of Python fundamentals
2. Exercise analytical thought process
3. Demonstrate project management methods and documentation

## Project Data
### Restraints
The data source `insurance.csv` contains a total of 1338 records, spanning persons ages 18 through 64 who have 0 to 5 children.

In [3]:
import csv
with open("insurance.csv") as data_file:
    data_dict = csv.DictReader(data_file)
    data_list = list(data_dict)

num_records = len(data_list)
print("Total Records: {}".format(num_records))

min_age, max_age, min_children, max_children = 0, 0, 0, 0
for record in data_list:
    current_age = int(record["age"])
    if min_age == 0:
        min_age = current_age
    elif current_age < min_age:
        min_age = current_age
    if max_age == 0:
        max_age = current_age
    elif current_age > max_age:
        max_age = current_age

    current_children = int(record["children"])
    if min_children == 0:
        min_children = current_children
    elif current_children < min_children:
        min_children = current_children
    if max_children == 0:
        max_children = current_children
    elif current_children > max_children:
        max_children = current_children
print("Max Age: {}".format(max_age))
print("Min Age: {}".format(min_age))
print("Max Children: {}".format(max_children))
print("Min Children: {}".format(min_children))

Total Records: 1338
Max Age: 64
Min Age: 18
Max Children: 5
Min Children: 0


### Analysis Objectives
##### 1. Define demographics

In [4]:
from itertools import product

# Define the lists
ages = ['18-21', '22-35', '36-50', '51-64']
sex = ['male', 'female']
children = ['yes', 'no']
smoker = ['yes', 'no']
region = ['southeast', 'southwest', 'northeast', 'northwest']

# Create all possible combinations
combinations = list(product(ages, sex, children, smoker, region))

# Define the column names
columns = ['Age', 'Sex', 'Children', 'Smoker', 'Region']

# Create the dictionary with unique integer IDs
data_dict = {i: dict(zip(columns, combo)) for i, combo in enumerate(combinations, start=1)}

# Optional: print a sample
print(f"Total combinations: {len(data_dict)}")
print("Sample entry:", data_dict[1])

Total combinations: 128
Sample entry: {'Age': '18-21', 'Sex': 'male', 'Children': 'yes', 'Smoker': 'yes', 'Region': 'southeast'}


##### 2. Comparative Analysis
**Do smokers have higher medical charges than non-smokers?**

**How do charges differe between males and females?**

**Is there a difference between average charges between regions?**


##### 3. Correlation Analysis
**Does age correlate with medical charges?**

**How does the number of children affect medical charges?**

**Is there a relationship between smoking status and BMI?**
