# U.S. Medical Insurance Costs

# Predictive Insurance Cost Analysis

## Project Overview

This project focuses on analyzing medical insurance costs to uncover key insights and develop a predictive model for estimating individual insurance charges. Using a dataset titled "Medical Cost Personal Data Sets," we delve into various attributes such as age, sex, BMI, number of children, smoking status, region, and individual medical charges.

### Objectives

- **Data Exploration**: Understand the basic distribution and characteristics of the dataset through descriptive statistics.
- **Comparative Analysis**: Investigate the impact of smoking on insurance costs and analyze demographic patterns, such as age and region, in relation to insurance charges.
- **Predictive Modeling**: Develop a function capable of predicting insurance costs based on patient attributes, leveraging the insights gained from our analyses.

### Approach

The project is structured into several key steps, executed within a Jupyter notebook to facilitate both the analysis and documentation process:

1. **Data Parsing**: Load and organize the dataset into structured lists for analysis.
2. **Data Analysis**: Utilize the `PatientsInfo` class to methodically explore and analyze patient data across various dimensions.
3. **Statistical Insights**: Provide descriptive statistics to get an overview of the data's central tendencies and distributions.
4. **Predictive Model Development**: Construct a predictive model to estimate insurance costs based on relevant patient attributes.
5. **Summary of Findings**: Consolidate and communicate the key insights and implications of our analysis, underscoring the predictive model's accuracy and reliability.

Through this project, we aim to demonstrate the application of data science methodologies to real-world datasets, offering valuable insights and predictive capabilities that can inform both individuals and organizations in the healthcare sector.


### Step 1: Importing Necessary Libraries
In this step, we will import the CSV library, which is essential for reading our dataset stored in a CSV file. The CSV format (Comma-Separated Values) is widely used for representing tabular data, and Python's CSV library provides functionality to easily parse and access the data within.


In [None]:
import csv

### Step 2: Parsing the Dataset
Using the CSV library, we'll parse the data from `insurance.csv`. We will read the data into individual lists corresponding to each column (age, sex, bmi, children, smoker, region, charges). This step is crucial for preparing the data for analysis.


### Step 3: Efficient Data Loading with a Helper Function
To streamline the process of loading our dataset into separate lists, we will create a helper function. This function will automate the parsing and organization of column data from the CSV file into lists. This approach enhances code reusability and efficiency.


### Step 4: Analysis with the `PatientsInfo` Class
Now that our data is organized, we're ready to analyze it. We will define a class called `PatientsInfo` with methods to investigate various attributes of the dataset. The class will include methods like `analyze_ages()`, `analyze_sexes()`, `unique_regions()`, `average_charges()`, and `create_dictionary()` to explore different aspects of the patient information. This object-oriented approach encapsulates our data and analysis functionality, making the code cleaner and more modular.


### Step 5: Descriptive Statistics of the Dataset
To gain a comprehensive understanding of the data, we will provide descriptive statistics, including mean, median, mode, and standard deviation for numeric columns, and counts for categorical columns. This step gives us an overview of the dataset's distribution and central tendencies.


### Step 6: Comparative Analysis of Costs
One of the key aspects we're interested in is the difference in medical charges between smokers and non-smokers. This analysis will help us understand the impact of smoking on medical expenses. We will compare the average costs for both groups and use statistical testing to determine if the differences are significant.


### Step 7: Predictive Modeling
To build on our analysis, we will create a function for predictive modeling. This function will use the attributes of the dataset to predict individual medical costs. We may use linear regression or another suitable model for this purpose. The goal is to develop a model that accurately predicts costs based on patient characteristics.


### Step 8: Summary of Findings
Finally, we will summarize the key findings from our analyses. This summary will highlight the most significant insights, including the impact of smoking on costs, demographic patterns, and the performance of our predictive model. This overview will provide clear, actionable insights derived from our data.
