# Section 1:  Data Processing

## Importing standard python libraries
To carry out this task, we need to import the required standard python default libraries needed to carry out the task e.g

`import csv`, 
`import json`,
`from datetime import datetime`




In [6]:
import csv #This will help us extract the data in the csv file and read in the ACW csv file
import json # This will help us to read and write json files
from datetime import datetime # Manipulation of date and time objects can be done using the datetime library


## Reading in the data and observing the headers

After the importation of our required libraries, this section will be to read in our csv file e.g `with open("acw_user_data.csv")` and observe the headers of our csv file in order to easily index each column head of our csv file in our subsequent manipulations of our data. A list comprehension has been used to create a list for the headers e.g ``` [x for x in iterable if condition]```

In [7]:
with open("acw_user_data.csv") as acw_users: #Opening the CSV file to be read in the data
    acw_user_data= csv.reader(acw_users) #Using one of the concepts of CSV to read in the file
    #The loop iterates over each row and assigns the index of each row to user_index and the data of each index to user_data
    #A list comprehension is used to create the acw_user_header using less blocks of code
    acw_user_header =[user_col for user_row, user_col in enumerate(acw_user_data) if user_row == 0]
print(acw_user_header) #Using the print function to visualize the acw_user_header
           

[['Address Street', 'Address City', 'Address Postcode', 'Age (Years)', 'Distance Commuted to Work (miles)', 'Employer Company', 'Credit Card Start Date', 'Credit Card Expiry Date', 'Credit Card Number', 'Credit Card CVV', 'Dependants', 'First Name', 'Bank IBAN', 'Last Name', 'Marital Status', 'Yearly Pension (GBP)', 'Retired', 'Yearly Salary (GBP)', 'Sex', 'Vehicle Make', 'Vehicle Model', 'Vehicle Year', 'Vehicle Type']]


## Casting of data types and re-ordering of the key-value pairs 
Carried out the required conversion of all flat structures into nested structures and also casted all data types to their various format e.g `int`,`float`,`bool`, in the process of casting to the various datatypes, a reordering of the key-value pairs was done in order to have a well-organized data structure and handling errors that could prevent casting the datatypes format while parsing the values to a list using the list comprehension method e.g ``` [x for x in iterable]```


In [12]:
with open("acw_user_data.csv") as acw_user: # Opening the csv file to read in the data
    acw_user_data= csv.reader(acw_user) #Using one of the methods of csv to read in the data
    #Re-ordered the key-value pairs of the acw_user_data and created a new list using list comprehension method
    try:
        acw_data_list = [{'First Name':str(user_column[11]),
                      'Last Name':str(user_column[13]),
                      'Age':int(user_column[3]),
                      'Sex':str(user_column[18]),
                      'Retired':(user_column[16].lower()== 'true'),
                      'Marital-status':str(user_column[14]),
                      'Yearly Salary (GBP)':float(user_column[17]),
                       'Yearly Pension (GBP)':float(user_column[15]),
                       'Company':str(user_column[5]),
                       'Commute Distance':float(user_column[4]),
                       'Vehicle':{'Make':str(user_column[19]),
                       'Model': str(user_column[20]),
                       'Year':int(user_column[21]),
                       'Type':str(user_column[22])    
            },
            'CreditCard':{ 'Start-date':str(user_column[6]),
                         'End-date':str(user_column[7]),
                         'Number':int(user_column[8]),
                         'CCV':int(user_column[9]),
                         'IBAN':str(user_column[12])    
            },
            'Dependants': (int(user_column[10])),
            'Address': {'Street':str(user_column[0]),
                         'City':str(user_column[1]),
                         'Postcode':str(user_column[2])    
              } 
        } for user_row,user_column in enumerate(acw_user_data) if user_row !=0]
    except ValueError:
        print("Problematic rows for dependants in casting empty strings")
        



Problematic rows for dependants in casting empty strings


## Resolving errors in the dependants column
A list has been created in identifying problematic rows for dependants using some conditions e.g ``` [x for x in iterable if condition]```

In [9]:
with open("acw_user_data.csv") as acw_user: #Opening the file to read it in
    acw_user_data= csv.reader(acw_user)#Reading the file  using a csv method
    #Created a list for the problematic rows for dependants using the list comprehension while placing some conditions
    Problematic_rows_for_dependents = [f"Problematic rows for Dependents in row {user_row + 1}" 
                                       for user_row, user_column in enumerate(acw_user_data) 
                                       if user_column[10] == ""  or  user_column[10] == " "]

print(Problematic_rows_for_dependents) #Printed out the list to visualize problematic rows for dependants

['Problematic rows for Dependents in row 23', 'Problematic rows for Dependents in row 111', 'Problematic rows for Dependents in row 181', 'Problematic rows for Dependents in row 207', 'Problematic rows for Dependents in row 272', 'Problematic rows for Dependents in row 274', 'Problematic rows for Dependents in row 276', 'Problematic rows for Dependents in row 360', 'Problematic rows for Dependents in row 462', 'Problematic rows for Dependents in row 470', 'Problematic rows for Dependents in row 581', 'Problematic rows for Dependents in row 638', 'Problematic rows for Dependents in row 681', 'Problematic rows for Dependents in row 727', 'Problematic rows for Dependents in row 824', 'Problematic rows for Dependents in row 867', 'Problematic rows for Dependents in row 919', 'Problematic rows for Dependents in row 933', 'Problematic rows for Dependents in row 985']


 ## Replacing problematic rows for dependants with meaning values 
The problematic rows for dependants will be replaced with the value `0` when encountered to ensure conversion from task 2

In [10]:
with open("acw_user_data.csv") as acw_user: # Opening the csv file to read in the data
    acw_user_data= csv.reader(acw_user) #Using one of the methods of csv to read in the data
    #Resolved the problematic rows for dependants in our data by replacing empty strings with 0
    acw_data_list = [{'First Name':str(user_column[11]),
                      'Last Name':str(user_column[13]),
                      'Age':int(user_column[3]),
                      'Sex':str(user_column[18]),
                      'Retired':(user_column[16].lower()== 'true'),
                      'Marital-status':str(user_column[14]),
                      'Yearly Salary (GBP)':float(user_column[17]),
                       'Yearly Pension (GBP)':float(user_column[15]),
                       'Company':str(user_column[5]),
                       'Commute Distance':float(user_column[4]),
                       'Vehicle':{'Make':str(user_column[19]),
                       'Model': str(user_column[20]),
                       'Year':int(user_column[21]),
                       'Type':str(user_column[22])    
            },
            'CreditCard':{ 'Start-date':str(user_column[6]),
                         'End-date':str(user_column[7]),
                         'Number':int(user_column[8]),
                         'CCV':int(user_column[9]),
                         'IBAN':str(user_column[12])    
            },
            'Dependants': (int(user_column[10])if user_column[10] !="" and user_column[10] !=" " else 0),
            'Address': {'Street':str(user_column[0]),
                         'City':str(user_column[1]),
                         'Postcode':str(user_column[2])    
              } 
        } for user_row,user_column in enumerate(acw_user_data) if user_row != 0]

for user_column in acw_data_list:#Iterating over our acw_data_list to view the fixed dependants column
        print(user_column['Dependants'])#printed out the dependant column to observe changes made to the dependants column
print('Problematic dependants rows have been fixed')

3
1
1
2
2
3
3
1
2
2
3
2
3
3
4
2
3
2
3
2
2
0
1
2
2
1
2
3
2
1
2
1
5
2
2
2
2
2
4
1
2
2
2
2
2
1
1
2
2
2
2
3
1
3
2
2
2
2
2
4
4
2
1
5
4
2
2
2
1
3
2
5
2
1
2
2
2
3
3
2
5
2
1
2
2
2
2
2
2
3
3
1
1
3
2
1
3
1
3
2
1
1
4
4
2
2
2
2
3
0
2
2
1
1
3
1
1
3
3
5
4
1
1
2
1
2
3
1
1
2
1
1
2
3
2
1
3
1
5
2
1
2
2
2
2
3
1
2
3
3
3
4
2
2
3
3
4
1
1
2
4
3
2
3
3
1
2
3
2
2
1
1
1
5
3
2
1
1
3
0
2
2
3
2
2
2
3
1
2
2
5
3
2
2
2
2
2
1
4
1
2
3
2
2
2
0
2
1
3
1
2
2
1
2
2
1
3
2
2
2
4
1
2
3
3
2
2
1
1
1
2
4
1
2
2
2
1
1
2
3
2
2
1
3
2
2
2
2
3
2
2
2
2
1
3
1
2
2
1
1
1
2
1
2
1
2
2
2
3
2
0
3
0
2
0
3
3
2
2
3
4
1
2
1
5
2
3
3
2
4
3
2
2
2
2
2
1
3
2
2
2
2
2
2
2
1
2
3
5
1
1
1
1
1
2
2
3
1
2
2
1
2
3
5
3
2
1
1
4
2
2
1
3
2
2
1
1
2
2
1
4
2
3
2
3
2
2
3
4
1
2
2
1
1
4
1
2
3
0
2
2
2
2
2
2
2
2
2
2
3
2
2
2
2
2
3
1
3
2
2
2
2
2
3
2
5
2
1
1
4
2
1
2
2
2
2
1
5
2
3
1
1
2
5
2
2
2
2
1
2
2
4
2
3
1
2
4
3
1
3
2
5
3
2
5
2
2
2
2
3
2
3
1
1
1
2
1
1
2
2
1
3
1
3
1
1
1
5
3
4
2
2
4
1
4
1
2
3
1
1
0
2
1
4
5
1
2
1
0
2
2
3
2
3
1
1
2
3
3
2
3
2
2
1
2
2
1
2
3
2
1
2
1
2
2
2
1
2
2
2


In [11]:
if user_col[10] !="" and user_col[10] !=" " else 0

SyntaxError: invalid syntax (Temp/ipykernel_12412/1163018850.py, line 1)

## Writing all records to a processed.json file in the JSON data format 
All records were converted to json strings using `json.dumps` and then, writing all records to the new file `Processed.json` using `json.dump` also, the data was loaded using the json method `json.load`

In [None]:
processed_object = json.dumps(acw_data_list) #Created a json string using the json method json.dumps
with open('Processed.json', mode='w') as json_file:#Opened up the Processed.json file in the write mode to parse in the json string
    json.dump(processed_object, json_file) #Parsed in the json string using a json method json.dump
with open('Processed.json', mode='r') as json_file:#Opened up the Processed.json in the read mode
    print(json.load(json_file)) #Used a json method  json.load the load in the file which contains the parsed json string

## Creation of additional files 
Additional files for `retired.json` and `employed.json` have been created as requested by the employer. A list comprehension method has been used to create  lists for the additional files e.g ``` [x for x in iterable if condition]``` which are converted to json strings using the `json.dumps` method and writing the json strings to thier various files using the json method `json.dump`

In [None]:
#Retired employees data captured using a list comprehension to create a list of retired employees
Retired_data = [retired for retired in acw_data_list if retired['Retired'] == True]
#employed data captured using a list comprehension to create a list of employed
employed_data = [retired for retired in acw_data_list if retired['Retired'] == False]
Retired_json_dumps = json.dumps(Retired_data) #converted the retired data to a json string
employed_json_dumps = json.dumps(employed_data)#converted the employed data to a json string
with open('retired.json', mode='w') as retired_json:#Opened the retired json file in the write mode
    json.dump(Retired_json_dumps, retired_json)#Parsed in the json string of those who have retired
with open('employed.json', mode='w') as employed_json:#Opened the retired json file in the write mode
    json.dump(employed_json_dumps, employed_json)#Parsed in the json string of those who are employed


## Writing a function to resolve some issues with credit card entries 
6. Using the imported `datetime` library,for manipulation of dates and time object, a function that takes care of those issues 
with credit card entries have been put in place e.g 



`def  remove_ccard(user_col):
    Start_date = datetime.strptime(user_col['CreditCard']['Start-date'], '%m/%y')
    End_date = datetime.strptime(user_col['CreditCard']['End-date'], '%m/%y')
    diff_in_years = End_date.year - Start_date.year
    if diff_in_years > 10:
        return True
    else:
        return False `
        
The `datetime` library ensured that the dates in the Creditcard column are parsed to the `datetime` format, a list was created for the cards to be removed and then converted to a json string using `json.dumps` method and eventually, writing the converted file to the json file `remove_ccard.json`
        
 


In [None]:
def  remove_ccard(user_col): #Defined a function to remove the cards with issues
    Start_date = datetime.strptime(user_col['CreditCard']['Start-date'], '%m/%y')# converted the Creditcard column date format to the datetime library format
    End_date = datetime.strptime(user_col['CreditCard']['End-date'], '%m/%y')# converted the Creditcard column date format to the datetime library format
    diff_in_years = End_date.year - Start_date.year # Created a variable to hold the difference betwen startdate and enddate
    if diff_in_years > 10:#Stated a filtering condtion on the date difference
        return True
    else:
        return False
#Created a list to hold the data of cards to be removed
remove_cards_list = [user_col for user_col in acw_data_list if remove_ccard(user_col) == True]
#Converted the list created to a json string
remove_cards_json = json.dumps(remove_cards_list)
with open('remove_ccard.json', mode='w') as remove_cards_file:#Opening the file in the write mode
    json.dump(remove_cards_json, remove_cards_file)#Writes the json string into the remove card file
    

## Some additional metrics which will be used for ranking customers
Created a salary-commute file to hold the list of distance taken to earn a certain salary, then opened the file using `with open('Processed.json') as processed_file`, added a new `key` which is `salary-Commute Per Mile` while filtering our data using some conditons, proceeded in defining a function `def sort_salary_commute(user_col):
    return user_col["Salary-Commute Per Mile"]` and then, sorted the data using the `sorted` function, after sorting the data,`json.load` was used to load in the `Processed.json` file and also converting it to a python object using `json.loads` observations were made using the index of the sorted format e.g `Sorted_salary_commute[0]),Sorted_salary_commute[-1]` to view the lower and upper bounds of the `Sorted_salary_commute`. Conversion of the `Sorted_salary_commute` has been done using the json methon `json.dumps` while eventually writing it to the `commute.json` using the `json.dump` method


In [None]:
salary_commute_file = [] #Created a list to hold the salary_commute data
with open('Processed.json', mode='r') as processed_file:#Opened the file in read mode
    processed_json = json.load(processed_file)#Loaded the processed.json file
    processed_object =json.loads(processed_json) #Converted 
    for user_col in processed_object:
        if user_col["Commute Distance"] <=1:
            user_col["Salary-Commute Per Mile"] = user_col["Yearly Pension (GBP)"]
        else:
            user_col["Salary-Commute Per Mile"] = user_col["Yearly Pension (GBP)"] / user_col["Commute Distance"]
        salary_commute_file.append(user_col)
    
def sort_salary_commute(user_col):
    return user_col["Salary-Commute Per Mile"]

Sorted_salary_commute = sorted(salary_commute_file, key= sort_salary_commute, reverse = False)
print(Sorted_salary_commute[0])
print(Sorted_salary_commute[-1])
sorted_salarycommute_json = json.dumps(Sorted_salary_commute)
with open('commute.json', mode='w') as commute_file:
    commute_json = json.dump(sorted_salarycommute_json, commute_file)

# Section 2: Data Visualization

## Importing the required libraries for the data visualization
The required libraries for data visualization needed for this section are; `pandas`, `seaborn`,`matplotlib` and will be imported

In [None]:
import pandas as pd #Pandas library will be used for data analysis and manipulations in dataframe structure
import seaborn as sns #This library will be used for graphical visualization of the data
import matplotlib.pyplot as plt #Contains a lot of plotting utilities for graphical visualization
%matplotlib inline

## Reading in the file and subsetting Salary and Age data series (Question 1)
Read in the file using a pandas library function `read_csv` and also checked for nans using pandas method `isna()` and then, summed up the values using pandas function `sum()` to check for the total value of nans in the dataframe

In [None]:
acw_data = pd.read_csv('acw_user_data.csv') #reading in the acw_data using the csv function read_csv
salary_age_series=acw_data[['Yearly Salary (GBP)', 'Age (Years)']] #subsetting salary and age from the acw_data
salary_age_series.isna().sum() #Checking for missing values and nans

## Finding the Mean Salary (Yearly) (Question 1)
The mean salary is gotten firstly, by subsetting `Yearly Salary (GBP)` from `salary_age_series` while using the pandas library fuction `mean()` to find the mean and assigning it to a variable `mean_salary` while formatting the result of `mean_salary` to 2 decimal places.

In [None]:
mean_salary=salary_age_series['Yearly Salary (GBP)'].mean() #Applying the pandas library function mean()to find the mean on the subset data
print(f'The mean of the yearly salary is £{mean_salary:,.2f}') #printing out the result of mean_salary to 2 decimal places

## Finding the Median Age (Yearly) 

The median age is gotten firstly, by subsetting `Age (Years)` from `salary_age_series` while using the pandas library fuction `median()` to find the median and assigning it to a variable `median_age` while formatting the result of `median_age` to `0` decimal places.

In [None]:
median_age = salary_age_series['Age (Years)'].median() #Using the pandas library function median() to find the median on the subset data

print( f'The median of the age data series is {median_age:.0f}') #printing the result and rounding to 0 decimal places

##  a. Age, calculating how many bins would be required for a bin_width of 5.  ( Question 2)
Calculated the number of bins that will be required for a `binwidth` of `5`, setting the `binwidth` allows `seaborn` to automatically calculate the number of bins that will be required.`sns.displot(data=salary_age_series['Age (Years)'], binwidth=5, height=5.5, aspect=2)` is used to carry out the plotting using a `facetgrid` plot such as `displot` to visualize the data. Also, setting the title of the plot using `fig.suptitle` and the label for the y-axis using `set_ylabels`. `plt.show` is used to visualize the plot.


In [None]:
#Using a displot to create a facetgrid of the distribution plot of the age subset data using a binwidth of 5
age_dplt = sns.displot(data=salary_age_series['Age (Years)'], binwidth=5, height=5.5, aspect=2)
#Created a title for the plot using the fig.suptitle
age_dplt.fig.suptitle("Distribution of Age using a binwidth of 5", y=1.0, fontsize=12)
#added a label to the y-axis for clarity using the set_ylabels
age_dplt.set_ylabels("Frequency of Age (Years)")
plt.show() #This is used to display the output of the plot
print("") #Created some space
#Description of the observation of the plot
print("Fig 1.0: For a binwidth of 5, seaborn generated a total of 15 bins for our Age (Years) dataset as seen above")

## b. Dependents, fixing data errors with seaborn itself 

A data cleaning of our Dependants column will be carried out in order to make it suitable for further analysis of our data. A data subsetting to find out some key information of the `Dependants` column, using methods such as `count()` to know the number of non-null values and `isna()` to find out the number of null values to be corrected and applying the pandas function `sum()` to get the total values of nans.

In [None]:
Dependant_series=acw_data['Dependants'] #subsetting to generate the dependants column
print(f'Total number of non-null:{Dependant_series.count()}') #checking the total count of non-null values
print(f'Total number of nan values:{Dependant_series.isna().sum()}') #checking the total number of nans 


## Fixing the Data errors in the Dependants data series
Checking for all the unique values in the `Dependants` column using the pandas library method `unique()`

In [None]:
#Checking for all unique values in the dependant column
print(f" Unique values of Dependants:{acw_data['Dependants'].unique()}")

## Finding the mode of the dependants column
The mode of the dependans column will be used to fill in the missing values in the dependants column using the pandas library function `mode()` to find the mode.

In [None]:
#Finding the mode to fill missing values
print(f" The mode of Dependants:{acw_data['Dependants'].mode()}")

## Filling missing values
The nans have been replaced with the mode of the dependants column using pandas library method `fillna()` to fill in the missing values and specifying `inplace =True` to effect the changes in the dataframe.

In [None]:
acw_data['Dependants'].fillna( value=2.0, inplace=True)

## Observation of the dependants column
It is now observed that there are no missing values in the dependants column which shows that the data errors have been corrected. Pandas methods such as `count()` and `isna` were used to observe the dependants column while applying the pandas function `sum()` to check the total number of missing values.

In [None]:
#Checking for non-null values
print(f'Total number of non-null:{Dependant_series.count()}')
#Checking for missing values
print(f'Total number of nan values:{Dependant_series.isna().sum()}') 


### Plotting the Dependant column after fixing the data errors
After fixing the erros in the dependants column, a plotting of the distribution of data in the dependants column has been done to visualize the data parsing in some parameters such as `bins`, `height` and `aspect` to the `facetgrid` `displot` while setting the plot title using `fig.suptitle` and the y-axis label using `set_ylabels`

In [None]:
#Created a facetgrid distribution plot
dependant_dplt = sns.displot(data=acw_data['Dependants'], bins=5, height=5.5, aspect=2)
#Set a title for the plot
dependant_dplt.fig.suptitle("Dependants distribution", y=1.0, fontsize=12)
#Set a label for the y-axis
dependant_dplt.set_ylabels("Frequency of Dependants")
plt.show() #Display the plot
# A description of the plot
print("Fig 1.2: The x-axis shows the data points distribution of the Dependants column,while the y-xis shows the frequency of the dependants column")

## c. Age (of default bins), conditioned on Marital Status
Using the facetgrid `displot` to visualize the data of age conditioned on marital status. This is achieved by parsing in the parameter `hue`on marital status to set the condition. The x-axis will contain the data points of age while the y-axis shows the frequency of age. A plot title has been created using `fig.suptitle` and the y-axis labeled appropriately using `set_ylabels`, `plt.show` is used to display the plot.

In [None]:
#Created a facetgrid distribution plot and conditioned it on marital status
age_marital_dplt = sns.displot(data=acw_data, x= 'Age (Years)', hue ='Marital Status')
#Created a title for the plot
age_marital_dplt.fig.suptitle(' Univariate plot of Age (Years) conditioned on Marital status',y=1.0, fontsize=12)
#Labled the y-xis for clarity
age_marital_dplt.set_ylabels("Frequency of Age (Years)")
#Displays the plot
plt.show()
#Description of the plot
print("Fig 1.3: The x-axis shows the data points distribution of the Age (Years) column with default bins,while the y-xis shows the frequency of the Age (Years) column conditioned on marital status")

## Perform multivariate plots with the following data attributes (Question 3)

### a. Commuted distance against salary
Used a `relplot` which is a more generalized plotting function to establish the relationship between `Distance Commuted to Work (miles)` and `Yearly Salary (GBP)`, also setting the title of the plot using `plt.title` while displaying the plot using `plt.show()`.

In [None]:
#Used a relplot which is a generalized plotting function to create the plot
commute_distance_salary_rlpt = sns.relplot(data=acw_data, x="Distance Commuted to Work (miles)", y="Yearly Salary (GBP)", height=5.5, aspect=2)
#Set a title for the plot
plt.title("commute distance from work and their Yearly Salary")
#Displayed the plot
plt.show()
#Description about the plot
print("Fig 1.4: The figure above depicts the distance taking to commute to work and how much earned yearly depending on the distance")

## b. Age against Salary 
Used a `relplot` which is a more generalized plotting function to establish the relationship between `Age (Years)` and `Yearly Salary (GBP)`, also setting the title of the plot using `plt.title` while displaying the plot using `plt.show()`

In [None]:
#Used a relplot which is a generalized plotting function to create the plot
age_salary_rlpt = sns.relplot(data=acw_data, x="Age (Years)", y="Yearly Salary (GBP)", height=5.5, aspect=2)
#Set a title for the plot
plt.title("Age (Years) against Yearly Salary")
#Displays the plot
plt.show()
#Description about the plot
print("Fig 1.5: Using a relplot in the above graph which is a more generalized plotting function to show the relationship between age and salary, although the visualization does not say much")

## Showing where the line of best fit occured among the data points.
An `implot` to show the line of best fit in the data points distribution has been used to make more meaning of the data while using `plt.title` to set the title of the plot and `plt.show` to display the plot

In [None]:
#An implot has been used to derive more meaning from the data points
age_salary_plt = sns.lmplot(data=acw_data, x="Age (Years)", y="Yearly Salary (GBP)", height=5.5, aspect=2)
#Sets a title for the plot
plt.title("Age (Years) against Yearly Salary")
#Displays the plot
plt.show()
#Description about the plot
print("Fig 1.5: Using an implot function to show an observation where the line of best fit maps a significant relationship of the data points")

### c. Age against Salary conditioned by Dependants
An `implot` to show the line of best fit in the data points distribution has been used to make more meaning of the data while using the `hue` parameter to state a condition using the Dependants data. Also,`plt.title` to set the title of the plot and `plt.show` to display the plot.


In [None]:
#An implot has been used to derive more meaning from the data points conditioned on the dependants data
age_salary_dependants_implt = sns.lmplot(data=acw_data, x="Age (Years)", y="Yearly Salary (GBP)", height=5.5, aspect=2, hue='Dependants')
# Sets a title for the plot
plt.title("Age (Years) against Yearly Salary")
#Displays the plot
plt.show()
#Description of the plot
print("Fig 1.5: Using an implot function to show an observation where the line of best fit maps a significant relationship of the data points conditioned on the Dependants column")

## Saving all the plots generated (Question 4)

In [None]:
age_dplt.savefig("./age_dplt.png") #Univariate saved plot for age
dependant_dplt.savefig("./dependant_dplt.png") #Univariate saved plot for dependants to show data corrections made
age_marital_dplt.savefig("./age_marital_dplt.png") #Univariate saved plot for age conditioned on marital status
#Multivariate saved plot of commuted distance on salary
commute_distance_salary_rlpt.savefig("./commute_distance_salary_rlpt.png") 
#Multivariate saved plot of age against salary
age_salary_rlpt.savefig("./age_salary_rlpt.png")
#Multivariate saved plot of age against salary showing the line of best fit of the data points
age_salary_plt.savefig("./age_salary_plt.png") 
#Multivariate saved plot of age against salary conditioned on dependants
age_salary_dependants_implt.savefig("./age_salary_dependants_implt.png") 
