# U.S. Medical Insurance Costs

## 1. Overview

### 1.1 Approach
1. Define the objective
2. Review the data
3. Set specific Goals for the analysis
4. Analyze the data
5. Evaluate the results
6. Present findings

### 1.2 Project Objective
Given the `insurance.csv` file, conduct an analysis on the contents of the file using Python. Identify relevant trends in the data and present the results of the analysis.

### 1.3 Trends to Identify
- Find the `region` with the highest average `charges`.
- Find the average `age` of the dataset.
- Is there a correlation between being a `smoker` and the `charges`?
- Find the average `charges` for each `age`.
- Find the average `charges` for each `sex`.
- Find the average `age` of people who have at least 1 `children`.

## 2. Program

### 2.0 Dependencies

    pip install numpy

### 2.1 Read the CSV File

In [None]:
import csv
import numpy as np

with open('insurance.csv', newline = '') as read_csv:
    csv_content = csv.DictReader(read_csv)
    csv_dict = {}
    id = 1
    for row in csv_content:
        row['id'] = id
        csv_dict[id] = row
        id += 1

# Uncomment the following line to validate the that the csv file was opened and read correctly.
#print(csv_dict)

### 2.2 Find the `region` with the highest average `charges`
#### 2.2.1 Define a Function to Create a Dictionary by Region
Define a function which receives the dictionary object created by reading the .CSV file and creates a new dictionary where the key is the `region` and all the values are the dictionary lines with that `region` value.

In [None]:
def create_region_dict(csv_dict):
    csv_regions_dict = {}
    for id in csv_dict:
        current_region = csv_dict[id]['region']
        current_id = csv_dict[id]
        if current_region not in csv_regions_dict:
            csv_regions_dict[current_region] = [current_id]
        else:
            csv_regions_dict[current_region].append(current_id)
    return csv_regions_dict

csv_regions_dict = create_region_dict(csv_dict)
# Uncomment the following line to validate the `create_region_dict` function
# print(csv_regions_dict)

#### 2.2.2 Define a Function to Get the Sum and Count of Charges in Each Region
Define a function which receives the Dictionary by Region variable and then sums and counts the charges in each `region` and then saves the results to a new dictionary.

In [None]:
def get_regional_charges(csv_regions_dict):
    total_regional_charges_dict = {}
    max_charges = 0
    max_region = ''
    for region in csv_regions_dict:
        total_regional_charges_dict[region] = {'sum': 0, 'count': 0}
        for row in csv_regions_dict[region]:
            i = 1
            current_charges = float(row['charges'])
            total_regional_charges_dict[region]['sum'] += current_charges
            total_regional_charges_dict[region]['count'] += i
            i += 1
        if total_regional_charges_dict[region]['sum'] > max_charges:
            max_charges = total_regional_charges_dict[region]['sum']
            max_region = region
        else:
            pass
    return total_regional_charges_dict, max_region, max_charges

total_regional_charges_dict, max_region, max_charges = get_regional_charges(csv_regions_dict)
# Uncomment the following lines to validate the `get_regional_charges` function
# print(str(total_regional_charges_dict) + '\n')
print("The " + max_region + " region has the highest amount of charges at $" + str(max_charges) + " total charges.")

#### 2.2.3 Get Average Charges Function

In [None]:
def get_avg_charges(charges_dict):
    avg_charges_dict = {}
    for region in charges_dict:
        avg_charges_dict[region] = 0
        for row in charges_dict[region]:
            current_avg_charges = float(charges_dict[region]['sum']) / float(charges_dict[region]['count'])
            avg_charges_dict[region] = current_avg_charges
        print("The " + region + " region has an average charge amount of $" + str(avg_charges_dict[region]) + ".")
    return avg_charges_dict
        
avg_charges_dict = get_avg_charges(total_regional_charges_dict)
# Uncomment the following line to validate the `get_avg_charges` function
# print(avg_charges_dict)

## 3.1 Find the average `age` within the dataset
#### 3.1.1 Define a Function to Get the Average Age Within the Dataset

In [None]:
def get_avg_age(csv_dict):
    total_ages = 0
    count_ages = 0
    for id in csv_dict:
        total_ages += float(csv_dict[id]['age'])
        count_ages += 1
    return total_ages/count_ages
avg_age = get_avg_age(csv_dict)
print("The average age within the dataset is " + str(avg_age) + " years.")

## 4.1 Is there a correlation between being a `smoker` and `charges`?
When one set of values consists of only 1s and 0s (binary data) and the other set consists of continuous float values (numeric data), you can still calculate a correlation coefficient. However, it's important to understand that the resulting correlation coefficient will be influenced by the nature of the data and may not fully capture the relationship between the variables.
### 4.1.1 Define a Function to Get the Correlation Coefficient of `smoker` and `charges`

In [None]:
def get_smoker_charges(csv_dict):
    smoker = []
    charges = []
    for id in csv_dict:
        charges.append(float(csv_dict[id]['charges']))
        smokes = lambda x: 1 if x == 'yes' else 0
        smoker.append(smokes(csv_dict[id]['smoker']))
    return smoker, charges

smoker, charges = get_smoker_charges(csv_dict)
smoker_charge_correlation_coefficient = np.corrcoef(smoker, charges)[0,1]
print("The smoker charge correlation coefficient is: " + str(smoker_charge_correlation_coefficient))

## 5.1 Find the average `charges` by `age`

In [75]:
def get_avg_charges_by_age(csv_dict):
    total_charges = 0
    count_charges = 0
    ages_dict = {}
    for id in csv_dict:
        i = 1
        current_age = csv_dict[id]['age']
        if current_age not in ages_dict:
            ages_dict[current_age] = {'sum': 0, 'count': 0}
        else: pass
        current_charges = float(csv_dict[id]['charges'])
        ages_dict[current_age]['sum'] += current_charges
        ages_dict[current_age]['count'] += i
        i += 1
    
    ages_avg_charges_dict = {}
    for age in ages_dict:
        ages_avg_charges_dict[age] = float(ages_dict[age]['sum']) / float(ages_dict[age]['count'])
    return ages_avg_charges_dict
ages_avg_charges_dict = get_avg_charges_by_age(csv_dict)
print(ages_avg_charges_dict)

{'19': 9747.909334558823, '18': 7086.2175563623205, '28': 9069.187564285712, '33': 12351.53298730769, '32': 9220.300290769232, '31': 10196.980573333332, '46': 14342.590638620688, '37': 18019.9118772, '60': 21979.418507391303, '25': 9838.365310714285, '62': 19163.85657347826, '23': 12419.820039642855, '56': 15025.515836538463, '27': 12184.701721428573, '52': 18256.26971931034, '30': 12719.110358148146, '34': 11613.52812076923, '59': 18895.869531599998, '63': 19884.998460869567, '55': 16164.54548846154, '22': 10012.932801785715, '26': 6133.825308571429, '35': 11307.182031200002, '24': 10648.015962142857, '41': 9653.745649629629, '38': 8102.733674, '36': 12204.476138, '21': 4730.464329642857, '48': 14632.50044517241, '40': 11772.25131, '58': 13878.9281116, '53': 16020.930755000003, '43': 19267.27865333333, '64': 23275.530837272723, '20': 10159.697736206897, '61': 22024.45760869565, '44': 15859.396587037038, '57': 16447.185250000002, '29': 10430.158727037038, '45': 14830.199856206897, '54'

## 6.1 Find the average `charges` by `sex`