# Goals

- Extract data and make it readable.
- Use that data to formulate text-based answers to questions.

# Data
 
- age; Age of user. NOTE: comes in a string (convert to (int))
- sex; Sex of user
- bmi; measure of body fat. NOTE: comes in string (convert to (float))
- bhildren; number of chidren. NOTE: comes in a string (convert to (int))
- smoker; is the user a smoker. NOTE: comes in in 'yes/no'
- region; region of the user
- charges; annual premium cost. NOTE: comes in as string (convert to (float))

# Analysis

As an insurance agent, most questions would relate to cost, ie:
- cost of a smoker vs non-smoker
- cost of children
- cost at certain ages

# Functionality

Created class ```InsuranceAnalyzer```.
-	Initializing the class requires a call, in this case a CSV document.

-	```convert_strings_to_numbers``` takes a CSV as an args. This function is to convert the strings to numbers and floats for mathematical operations. Returns a list of dictionaries with numeric values converted from string to int or float.

-	```cost_of_smoking``` takes no args. This function finds and compares the average cost of insurance for smokers and non-smokers. Returns a formatted string stating the average cost for smoker and non-smokers, and the difference in cost between the two.

-	```calculate_median``` takes no args. Finds the median age of insured persons in the dataset. Returns int.

-	```insurance_above_below_median``` takes int as arg, ```ask_age```. Compares arg to median from ```calculate_median``` function. Returns string stating whether the ```ask_age``` is older, younger or same as median age.

-	```cost_by_region``` takes str as arg, ```region```: based on region, returns the average cost of all insured in that region. 

-	```cost_by_age``` takes int as arg, ```age```: based on age, returns the average cost of all insured at that age.

-	```find_insured_user``` takes unspecified amount of keyword arguments, ```**kwargs``` each keyword represents a key from the dictionaries in the ```insurance_list```. Returns the users as a list of dictionaries based on ```**kwargs```

-	```avg_cost_by_user``` takes unspecified amount of keyword arguments, ```**kwargs``` each keyword represents a key from the dictionaries in the ```insurance_list```. Returns the average cost of insurance based on ```**kwargs```

This was all before I knew more about Pandas library. It would have been a huge help and made this a lot simpler.

Live and learn.

In [51]:
import csv
import statistics
import pandas as pd
import matplotlib.pyplot as plt

class InsuranceAnalyzer:
    
    MEDIAN = 0
    
    def __init__(self, insurance_csv):
        self.insurance_list = self.convert_strings_to_numbers(insurance_csv)
    
    
    def convert_strings_to_numbers(self, insurance_csv):
        insurance_list = []
        with open(insurance_csv) as insurance_file:
            reader = csv.DictReader(insurance_file)
            for row in reader:
                row['age'] = int(row['age'])
                row['charges'] = round(float(row['charges']), 2)
                row['bmi'] = float(row['bmi'])
                row['children'] = int(row['children'])
                insurance_list.append(row)
        return insurance_list

    
    #todo: clean up the cost_of_smoking method, use (smoker, **kwargs) as args. compile to avgs based on all other kwargs, find difference between the two
    def cost_of_smoking(self):
        smokers_cost = 0
        num_of_smokers = 0
        nonsmokers_cost = 0
        num_of_nonsmokers = 0

        for item in self.insurance_list:
            if item['smoker'] == 'yes':
                smokers_cost += item['charges']
                num_of_smokers += 1
            elif item['smoker'] == 'no':
                nonsmokers_cost += item['charges']
                num_of_nonsmokers += 1

        smokers_avg_cost = round(smokers_cost / num_of_smokers, 2) if num_of_smokers > 0 else 0
        nonsmokers_avg_cost = round(nonsmokers_cost / num_of_nonsmokers, 2) if num_of_nonsmokers > 0 else 0

        return f'The average cost of insurance for a smoker is: {smokers_avg_cost}. \
                The average cost of insurance for a non-smoker is: {nonsmokers_avg_cost}. \
                That is a difference of {smokers_avg_cost - nonsmokers_avg_cost}.'
                
    
    #i could make MEDIAN, AVG global variables  --tried it more work than i want to do --maybe?? when creating the object, i would need to call the #create median function anyway. Otherwise the MEDIAN global variable is never calculate  
    def calculate_median(self):
        ages = [row['age'] for row in self.insurance_list]
        sorted_ages = sorted(ages)
        median = round(statistics.median(sorted_ages))
        return median
    
    # I want to use the median to find the avg cost above/below the median
    
    def insurance_above_below_median(self,ask_age): #can use age/ask_age as parameter
        # ask_age = int(input("What is your age?"))
        median = self.calculate_median()
        if ask_age > median:
            print(f'You are older than the median age of: {median}')
        elif ask_age < median:
            print(f'You are younger than the median age of: {median}')
        else:
            print(f'You are the median age of: {median}')

    def cost_by_region(self, region):
        region_length = len([row['region'] for row in self.insurance_list if row['region'] == region])
        region_avg_cost = round(sum([row['charges'] for row in self.insurance_list if row['region'] == region]) / region_length, 2)
        return f'The average cost of insurance for the {region} region is ${region_avg_cost}.'
        
    def cost_by_age(self,age): # iterate through all ages available, avg the cost
        pass
    
        
    def find_insured_user(self,**kwargs): # I want to be able to calculate cost depending on different values
        #initiate our list with all data
        insurance_list = self.insurance_list
        
        for key, value in kwargs.items():
            #iterate through the original list, and overwrite. each iteration will drill down 
            insurance_list = [row for row in insurance_list if row.get(key, None) == value]
        
        if not insurance_list:
            return "No matching records."
        
        
        return insurance_list
    
    def avg_cost_by_user(self,**kwargs):
        user_list = self.find_insured_user(**kwargs)
        avg = round(sum([row['charges'] for row in user_list]) / len(user_list),2)
        return f'The average cost for an insured person with the selected parameters is {avg}.'
                   
   
                    
    
insurance_analyzer = InsuranceAnalyzer('insurance.csv')
result = insurance_analyzer.cost_of_smoking()
median = insurance_analyzer.calculate_median()
# test = insurance_analyzer.insurance_above_below_median(27)
northwest_avg_cost = insurance_analyzer.cost_by_region('northwest')
insurance_analyzer.find_insured_user(age=29, sex='male', region='northeast')
male_29_southeast = insurance_analyzer.find_insured_user(age=29, sex='male', region='southeast')


# user_cost_chart = pd.DataFrame(avg_cost)

insurance_analyzer.convert_strings_to_numbers('insurance.csv')
male_29_southeast
insurance_analyzer.calculate_median()
insurance_analyzer.cost_by_region('southwest')



39