## Adding labels to Seaborn plots

**AIM**: This workbook describes how to include labels to Seaborn line charts

### 1. Load required libraries

In [1]:
# Pandas and os for data ingestion and file manipulation
import pandas as pd
import os

Loading also specific Seaborn libraries

In [2]:
import seaborn as sns
import matplotlib.pyplot as plt 
sns.set_theme(style="darkgrid")

### 2. Build path to project folder 

In this section, I build the path to the \data project folder where I can load any .xlsx or .csv file into Python.

In [3]:
my_wd = os.getcwd()
print("My working directory is:",my_wd)

My working directory is: /home/pablo/Documents/Pablo_zorin/VS_Python_GitHub_Pablo_source/ML-using-Python/Seaborn_gallery


- I need to change default WD to ML-using-Python folder to access \data sub-folder to ingest Excel file called "INE total and foreign population figures Spain.xlsx"

In [4]:
ML_using_Python_folder = os.path.join('/home','pablo','Documents','Pablo_zorin','VS_Python_GitHub_Pablo_source','ML-using-Python')
print('My Python project folder is:',ML_using_Python_folder)

My Python project folder is: /home/pablo/Documents/Pablo_zorin/VS_Python_GitHub_Pablo_source/ML-using-Python


- Then change default folder to this /ML-using-Python folder

In [5]:
os.chdir(ML_using_Python_folder)


In [11]:
new_wd = os.getcwd()
print("Changed default working directory to:",new_wd)

Changed default working directory to: /home/pablo/Documents/Pablo_zorin/VS_Python_GitHub_Pablo_source/ML-using-Python


### 3. Check data folder file contents

- Check file contents from \data folder and build path to Excel file to be imported into python 

In [6]:
data_folder = os.path.join('/home','pablo','Documents','Pablo_zorin','VS_Python_GitHub_Pablo_source','ML-using-Python','data')
data_folder_contents = os.listdir(data_folder)
print('data folder contents:',data_folder_contents)

data folder contents: ['wine_quality.zip', 'winequality.names', 'Monthly-AE-Time-Series-January-2024.xls', 'INE Resident population country of birth Spain.xlsx', 'AE_Time_Series_Data_website.txt', 'winequality-red.csv', 'OCDE_countries_population_figures_1970_2022.csv', 'all_wine_reset.csv', '03_INE_Spain_natural_growh_births_deaths.xlsx', '02 INE Spain CV population stocks and flows 2002 2025.xlsx', 'AE_Attendances_TypeI_2010_2025.csv', 'Monthly-AE-Time-Series-March-2025.xls', 'AE_Attendances_2010_2024.csv', 'Type_I_ATT_TEST.csv', 'Type_I_ATT_TRAIN.csv', 'ONS_Figure_2__Population_increase_in_mid-2023_was_driven_mostly_by_net_international_migration.xls', 'winequality-white.csv', 'INE total and foreign population figures Spain.xlsx', 'monthly-milk-production-pounds.csv', 'ONS_Figure_01_Long_term_emigration_immigration_net_migration.xlsx', 'ONS_long_term_immigration_end2024.xlsx', '01 INE resident population by nationality Spain and CV 2002 2024.xlsx']


#### 3.1 I want to import a .csv file for this script

- Scan data_folder contents to list all .xlsx files. I want import "AE_Attendances_Aug2010_Mar_2025.csv" file that includes Attendances and Admissions for the 2010-2025 time period.

In [8]:
for files in os.listdir(data_folder):
    if files.endswith('.csv'):
        print(files)
    else:
        continue

winequality-red.csv
OCDE_countries_population_figures_1970_2022.csv
all_wine_reset.csv
AE_Attendances_TypeI_2010_2025.csv
AE_Attendances_Aug2010_Mar_2025.csv
AE_Attendances_2010_2024.csv
Type_I_ATT_TEST.csv
Type_I_ATT_TRAIN.csv
winequality-white.csv
monthly-milk-production-pounds.csv


### 4. Import Aug2010_Mar_2025 csv file into Python

- From the above set of files, I want to import "AE_Attendances_Aug2010_mar_2025.csv" into Python and split it into Type I Attendances, Type II Attendances and Type III Attendances, as three independent .csv files, also I will create new variable "Total Attendances" As the sum of the three existing columns.  

In [18]:
Attendances_file = os.path.join('data','AE_Attendances_Aug2010_Mar_2025.csv')

- Import above .csv file into pyhon

In [19]:
AE_data = pd.read_csv(Attendances_file, parse_dates=True)
AE_data.head()

Unnamed: 0,Period,Type1_ATT,Type2_ATT,Type3_ATT
0,01/08/2010,1138652,54371,559358
1,01/09/2010,1150728,55181,550359
2,01/10/2010,1163143,54961,583244
3,01/11/2010,1111295,53727,486005
4,01/12/2010,1159204,45536,533001


Get some dataset info:

In [20]:
AE_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 176 entries, 0 to 175
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Period     176 non-null    object
 1   Type1_ATT  176 non-null    int64 
 2   Type2_ATT  176 non-null    int64 
 3   Type3_ATT  176 non-null    int64 
dtypes: int64(3), object(1)
memory usage: 5.6+ KB


- Using the pd.to_datetime() method in Pandas turns Period column into a datetime column

In [21]:
AE_data['Period'] = pd.to_datetime(AE_data['Period'])
AE_data.head()

Unnamed: 0,Period,Type1_ATT,Type2_ATT,Type3_ATT
0,2010-01-08,1138652,54371,559358
1,2010-01-09,1150728,55181,550359
2,2010-01-10,1163143,54961,583244
3,2010-01-11,1111295,53727,486005
4,2010-01-12,1159204,45536,533001


- But I want Period column to be a DATE column, so I add the dt.date function to previous code

In [22]:
AE_data['Period'] = pd.to_datetime(AE_data['Period']).dt.date
AE_data.head()

Unnamed: 0,Period,Type1_ATT,Type2_ATT,Type3_ATT
0,2010-01-08,1138652,54371,559358
1,2010-01-09,1150728,55181,550359
2,2010-01-10,1163143,54961,583244
3,2010-01-11,1111295,53727,486005
4,2010-01-12,1159204,45536,533001


In [23]:
AE_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 176 entries, 0 to 175
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Period     176 non-null    object
 1   Type1_ATT  176 non-null    int64 
 2   Type2_ATT  176 non-null    int64 
 3   Type3_ATT  176 non-null    int64 
dtypes: int64(3), object(1)
memory usage: 5.6+ KB


- Then I need to transform the other columns into numeric ones. Using pandas.to_numeric() method
- AE_data["Type1_ATT"] = pd.to_numeric(AE_data["Type1_ATT"])

In [24]:
AE_data.columns

Index(['Period', 'Type1_ATT', 'Type2_ATT', 'Type3_ATT'], dtype='object')

In [25]:
AE_data.head()

Unnamed: 0,Period,Type1_ATT,Type2_ATT,Type3_ATT
0,2010-01-08,1138652,54371,559358
1,2010-01-09,1150728,55181,550359
2,2010-01-10,1163143,54961,583244
3,2010-01-11,1111295,53727,486005
4,2010-01-12,1159204,45536,533001
