<i>LEARNING-CIRCLE-104 PROJECT<i>

# PANDAS

Pandas is a powerful library for data manipulation and analysis in python. It consist of several key modules and components that offer various functionalities. These modules and functionalities collectively form the backbone of Pandas, enabling users to efficiently handle, manipulate, and analyze data in Python. Some of the main modules and components in pandas are as follow:

##  Data Structures:
   - **Series:**
   A Series is a one-dimensional array-like object containing a sequence of values of the same type and an associated array of data labels, called its index. The simplest Series is formed from only an array of data.  A Series consists of data and labels, referred to as the index.
   - **DataFrame:**
   A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the dataframe in R. A DataFrame represents a rectangular table of data and contains an ordered, named collection of columns, each of which can be a different value type (numeric, string, Boolean, etc.). The DataFrame has both a row and column index; it can be thought of as a dictionary of Series all sharing the same index. </br>
   __Note:__
_While a DataFrame is physically two-dimensional, you can use it to represent higher dimensional data in a tabular format using hierarchical indexing_

In [1]:
# Importing the Pandas Package
import pandas as pd   # Here pd is an 'alias'  for the pandas library

### Series

#### **Creating a Series:**</br>
You can create a Series using the `pd.Series()` constructor. 

In [2]:
# Creating a series from a list
data = [10,20,30,40,50]
index =['A','B','C','D','E']
obj = pd.Series(data,index=index)

In [3]:
# Reading the series
obj

A    10
B    20
C    30
D    40
E    50
dtype: int64

#### **Accessing Elements:**</br>
You can access elements in a Series using index labels:

In [4]:
obj['C']

30

#### **Series Operations:**</br>
You can perform various operations on Series, such as addition and multiplication:

In [5]:
obj * 2

A     20
B     40
C     60
D     80
E    100
dtype: int64

#### **Filtering Data:**</br>
Filtering data based on conditions:

In [6]:
obj[obj > 30]

D    40
E    50
dtype: int64

#### **Series Attributes:**</br>
Series has attributes like index and values:

In [7]:
# Getting indexes
obj.index

Index(['A', 'B', 'C', 'D', 'E'], dtype='object')

In [8]:
# Getting values
obj.values

array([10, 20, 30, 40, 50], dtype=int64)

#### **Name and Index:**</br>
You can give the Series a name and assign labels to the index:

In [9]:
obj.name = 'Cell-Reference'
obj.index.name = 'Alphabet'
obj

Alphabet
A    10
B    20
C    30
D    40
E    50
Name: Cell-Reference, dtype: int64

### When to use Series
Series are a fundamental building block for data analysis in pandas.


We should use pandas Series if all of the following statements hold: 
- when you want to work with a single variable or feature.
- when you want to perform efficient data manipulations.
- When you wan to take advantage of the rich functionality provided by the pandas library. 


### Dataframe
#### Creating a Dataframe
We can create a dataframe by calling the `DataFrame()` constructor. The main arguments in the constructor are the data, index and columns. The data that is passed in can be in the form of other data structures (lists, dictionaries or numpy arrays) or by loading in a file.

Let's start our examples by first importing the Pandas library:

In [10]:
import pandas as pd

#### Dataframes from other Data Structures

We now look at some examples on how to create a dataframe from various data structures in Python.

Using **lists**, we need to create a list of lists (nested list) with the relevant data. We then also need to pass an index (player names in this example) and column names.

In [11]:
# Create list of lists containing data.
list_df = [["HR", 25, 50000], ["IT", 26, 60000], ["Finance" ,27, 45000]]

# Create index - names of Staffs.
index = ['Idris', 'Doyin', 'Temitayo']

# Create column names.
columns = ['Department', 'Age', 'Salary']

# Create dataframe by passing in data, index and columns.
pd.DataFrame(data=list_df, index=index, columns=columns)

Unnamed: 0,Department,Age,Salary
Idris,HR,25,50000
Doyin,IT,26,60000
Temitayo,Finance,27,45000


Using **dictionaries**, we need to create a dictionary with the relevant data.

The keys should be the column names, while the values should be the data entries for that column. We then also need to pass an index. Note that because the keys account for the column names, we don't have to pass in an argument for columns.

In [12]:
# Constructing a dataframe
# There are many ways to construct a DataFrame, though one of the most common is from a dictionary of equal-length lists
data = {"Name": ["Idris", "Doyin", "Temitayo", "Abdullahi", "Yima"],
        "Age": [25,26,27,28,29],
        'Salary': [50000, 60000, 45000, 70000, 55000],
    'Department': ['HR', 'IT', 'Finance', 'IT', 'Marketing']}

# Creating a dataframe from the data above
frame = pd.DataFrame(data)

# Reading the dataframe
frame

Unnamed: 0,Name,Age,Salary,Department
0,Idris,25,50000,HR
1,Doyin,26,60000,IT
2,Temitayo,27,45000,Finance
3,Abdullahi,28,70000,IT
4,Yima,29,55000,Marketing


Using **numpy arrays**, we need to create a numpy array with the relevant data. We then also need to pass an index (player names) and column names.

In [13]:
# Firstly, we import the nump package
import numpy as np

In [14]:
# Create numpy array containing data.
array_df = np.array([["HR", 25, 50000], ["IT", 26, 60000], ["Finance" ,27, 45000]])

# Create index - names of Staffs.
index = ['Idris', 'Doyin', 'Temitayo']

# Create column names.
columns = ['Department', 'Age', 'Salary']


# Create dataframe by passing in data, index and columns.
pd.DataFrame(data=array_df, index=index, columns=columns)

Unnamed: 0,Department,Age,Salary
Idris,HR,25,50000
Doyin,IT,26,60000
Temitayo,Finance,27,45000


### When to use Dataframes
Pandas dataframes can accommodate heterogeneous data. This makes them the choice data structure for manipulating often messy data (e.g tabular data from spreadsheets, or SQL tables).

We should use a Pandas dataframe if all of the following statements hold:

* We have 2-dimensional data (rows and columns)
* The data type is the same within a column
* We are interested in the index (rows) and column names

## IO Tools (Input/Output):
   - pd.read_csv(): Reading CSV files into DataFrames.
   - pd.read_excel(): Reading Excel files into DataFrames.
   - pd.read_sql(): Reading SQL database tables into DataFrames.
   - pd.to_csv(): Writing DataFrames to CSV files.
   - pd.to_excel(): Writing DataFrames to Excel files.

### Reading CSV File:

In [15]:
%pwd

'C:\\Users\\I\\Documents\\GitHub\\AltSchoolAfrica-Circle-104-Collaboration'

We use the `head()` function to only look at the first 5 records of our data. This is helpful if the dataframe has many rows and loading it will take lots of time.


In [16]:
# Reading a CSV file into a DataFrame
csv_file_path = '..//AltSchool-DataScience//dataset//sales_data.csv'
df_csv = pd.read_csv(csv_file_path)

# Displaying the DataFrame
print("DataFrame from CSV:")
df_csv.head()


DataFrame from CSV:


Unnamed: 0,Date,Day,Month,Year,Customer_Age,Age_Group,Customer_Gender,Country,State,Product_Category,Sub_Category,Product,Order_Quantity,Unit_Cost,Unit_Price,Profit,Cost,Revenue,Sales
0,11/26/2013,26.0,November,2013.0,19.0,Youth (<25),M,Canada,British Columbia,Accessories,Bike Racks,Hitch Rack - 4-Bike,8.0,45.0,120.0,590.0,360.0,950.0,960.0
1,11/26/2015,26.0,November,2015.0,19.0,Youth (<25),M,Canada,British Columbia,Accessories,Bike Racks,Hitch Rack - 4-Bike,8.0,45.0,120.0,590.0,360.0,950.0,960.0
2,3/23/2014,23.0,March,2014.0,49.0,Adults (35-64),M,Australia,New South Wales,Accessories,Bike Racks,Hitch Rack - 4-Bike,23.0,45.0,120.0,1366.0,1035.0,2401.0,2760.0
3,3/23/2016,23.0,March,2016.0,49.0,Adults (35-64),M,Australia,New South Wales,Accessories,Bike Racks,Hitch Rack - 4-Bike,20.0,45.0,120.0,1188.0,900.0,2088.0,2400.0
4,5/15/2014,15.0,May,2014.0,47.0,Adults (35-64),F,Australia,New South Wales,Accessories,Bike Racks,Hitch Rack - 4-Bike,4.0,45.0,120.0,238.0,180.0,418.0,480.0


We use the `tail()` function to only look at the last 5 records of our data. This is helpful if the dataframe has many rows and loading it will take lots of time.


In [17]:
df_csv.tail()

Unnamed: 0,Date,Day,Month,Year,Customer_Age,Age_Group,Customer_Gender,Country,State,Product_Category,Sub_Category,Product,Order_Quantity,Unit_Cost,Unit_Price,Profit,Cost,Revenue,Sales
113033,4/2/2016,2.0,April,2016.0,18.0,Youth (<25),M,Australia,Queensland,Clothing,Vests,"Classic Vest, M",22.0,24.0,64.0,655.0,528.0,1183.0,1408.0
113034,3/4/2014,4.0,March,2014.0,37.0,Adults (35-64),F,France,Seine (Paris),Clothing,Vests,"Classic Vest, L",24.0,24.0,64.0,684.0,576.0,1260.0,1536.0
113035,3/4/2016,4.0,March,2016.0,37.0,Adults (35-64),F,France,Seine (Paris),Clothing,Vests,"Classic Vest, L",23.0,24.0,64.0,655.0,552.0,1207.0,1472.0
113036,,,,,,,,,,,,,,,,,,,
113037,,,,,,,,,,,,,1345316.0,,,,,85271008.0,


### Reading Excel File:

In [18]:
# Reading an Excel file into a DataFrame
excel_file_path = '..//AltSchool-DataScience//dataset/sales_data_xlsx.xlsx'
df_excel = pd.read_excel(excel_file_path, sheet_name='Sheet1')

# Displaying the DataFrame
print("\nDataFrame from Excel:")
df_excel



DataFrame from Excel:


### Reading SQL Database:

### Writing DataFrames to CSV files

In [19]:
# Writing the DataFrame to a new CSV file
output_csv_file = 'output_data.csv'
df_csv.to_csv(output_csv_file, index=False)

# Reading the written CSV file to verify
df_written_csv = pd.read_csv(output_csv_file)
print("\nDataFrame from Written CSV:")
df_written_csv.head()



DataFrame from Written CSV:


Unnamed: 0,Date,Day,Month,Year,Customer_Age,Age_Group,Customer_Gender,Country,State,Product_Category,Sub_Category,Product,Order_Quantity,Unit_Cost,Unit_Price,Profit,Cost,Revenue,Sales
0,11/26/2013,26.0,November,2013.0,19.0,Youth (<25),M,Canada,British Columbia,Accessories,Bike Racks,Hitch Rack - 4-Bike,8.0,45.0,120.0,590.0,360.0,950.0,960.0
1,11/26/2015,26.0,November,2015.0,19.0,Youth (<25),M,Canada,British Columbia,Accessories,Bike Racks,Hitch Rack - 4-Bike,8.0,45.0,120.0,590.0,360.0,950.0,960.0
2,3/23/2014,23.0,March,2014.0,49.0,Adults (35-64),M,Australia,New South Wales,Accessories,Bike Racks,Hitch Rack - 4-Bike,23.0,45.0,120.0,1366.0,1035.0,2401.0,2760.0
3,3/23/2016,23.0,March,2016.0,49.0,Adults (35-64),M,Australia,New South Wales,Accessories,Bike Racks,Hitch Rack - 4-Bike,20.0,45.0,120.0,1188.0,900.0,2088.0,2400.0
4,5/15/2014,15.0,May,2014.0,47.0,Adults (35-64),F,Australia,New South Wales,Accessories,Bike Racks,Hitch Rack - 4-Bike,4.0,45.0,120.0,238.0,180.0,418.0,480.0


### Writing DataFrames to Excel files

In [20]:
# Writing the DataFrame to a new Excel file
output_excel_file = 'output_data.xlsx'
df_excel.to_excel(output_excel_file, index=False, sheet_name='Sheet1')

# Reading the written Excel file to verify
df_written_excel = pd.read_excel(output_excel_file, sheet_name='Sheet1')
print("\nDataFrame from Written Excel:")
df_written_excel



DataFrame from Written Excel:


## Data Manipulation:
   - Indexing and Selecting Data: Methods to slice, index, and select specific data from DataFrames or Series.
   - Merge, Join, and Concatenate: Functions to combine DataFrames based on keys or indices.
   - Reshaping and Pivoting: Methods like melt(), pivot(), pivot_table() for reshaping data.
   

In [21]:
#recall our dataframe called 'frame' above
frame

Unnamed: 0,Name,Age,Salary,Department
0,Idris,25,50000,HR
1,Doyin,26,60000,IT
2,Temitayo,27,45000,Finance
3,Abdullahi,28,70000,IT
4,Yima,29,55000,Marketing


### Indexing and Data Selection

Accessing data within dataframes is not as straightforward as with the previous data structures. This can be done by index, by column, or by both. Let's work through these methods.

#### By Index
To access by index only in a dataframe we can use the `iloc` or `loc` functions with the indices in square brackets. The `iloc` function refers to the index location, so we pass in the number of the index, while the `loc` function refers to the name of the index, so we pass in the index name. Use slicing if you want more than one index. Eg:

* `dataframe.iloc[index i]` - returns series at index i
* `dataframe.iloc[index start: index end]` - returns dataframe from start to end (end not included)
* `dataframe.loc['index name']` - returns series of given index name

Let's look at a few examples:

In [22]:
# Select the 5th row using iloc[].
frame.iloc[4]

Name               Yima
Age                  29
Salary            55000
Department    Marketing
Name: 4, dtype: object

In [23]:
# Select rows 5 to 10.
frame.iloc[4:10]

Unnamed: 0,Name,Age,Salary,Department
4,Yima,29,55000,Marketing


In [24]:
# Select the Idris index using loc[].
frame[frame['Name'] == 'Idris']

Unnamed: 0,Name,Age,Salary,Department
0,Idris,25,50000,HR


#### By Column
To access by column only we can simply call `dataframe['Column Name']`. If we want more than one column we input a list of column names inside the square brackets:

* `dataframe['Column Name']` - returns series of given column
* `dataframe[['Column 1', 'Column 2']]` - returns dataframe with the given columns

Let's look at examples.

In [25]:
# Selecting a single column
ages = frame['Age']
print("\nAges Column:")
ages


Ages Column:


0    25
1    26
2    27
3    28
4    29
Name: Age, dtype: int64

In [26]:
# Selecting multiple columns
selected_columns = frame[['Name', 'Salary']]
print("\nSelected Columns:")
selected_columns


Selected Columns:


Unnamed: 0,Name,Salary
0,Idris,50000
1,Doyin,60000
2,Temitayo,45000
3,Abdullahi,70000
4,Yima,55000


#### By index and column
We can also select a subset of the dataframe using indices and columns in combination. Let's look at a few examples:

In [27]:
# Select the first 5 rows and first 2 columns - Rows first.
frame.iloc[0:5][['Salary', 'Department']]

Unnamed: 0,Salary,Department
0,50000,HR
1,60000,IT
2,45000,Finance
3,70000,IT
4,55000,Marketing


In [28]:
# Select the first 5 rows and first 2 columns - Columns first.
frame[['Salary', 'Department']].iloc[0:5]

Unnamed: 0,Salary,Department
0,50000,HR
1,60000,IT
2,45000,Finance
3,70000,IT
4,55000,Marketing


### Merge, Join, and Concatenate

*Consider two DataFrames, one containing information about employees and the other about their projects.*

In [29]:
# Employee DataFrame
employee_data = {'EmployeeID': [1, 2, 3, 4],
                 'Name': ['Idris', 'Doyin', 'Temitayo', 'Abdullahi']}
employees = pd.DataFrame(employee_data)


#  Project DataFrame
project_data = {'EmployeeID': [2, 3, 4, 5],
                'Project': ['ProjectA', 'ProjectB', 'ProjectC', 'ProjectD']}
projects = pd.DataFrame(project_data)


In [30]:
#  Project DataFrame
project_data = {'EmployeeID': [2, 3, 4, 5],
                'Project': ['ProjectA', 'ProjectB', 'ProjectC', 'ProjectD']}
projects = pd.DataFrame(project_data)
projects

Unnamed: 0,EmployeeID,Project
0,2,ProjectA
1,3,ProjectB
2,4,ProjectC
3,5,ProjectD


- **Merge**:</br>
Merging is used to combine DataFrames based on a common column (or index). Merge combines DataFrames based on common columns.

In [31]:
# Merging based on 'EmployeeID'
merged_df = pd.merge(employees, projects, on='EmployeeID', how='inner')
print("Merged DataFrame:")
merged_df


Merged DataFrame:


Unnamed: 0,EmployeeID,Name,Project
0,2,Doyin,ProjectA
1,3,Temitayo,ProjectB
2,4,Abdullahi,ProjectC


- **Join**:</br>
Joining is similar to merging, and it is based on the index. Join combines DataFrames based on their index.

In [32]:
# Setting 'EmployeeID' as the index for both DataFrames
employees.set_index('EmployeeID', inplace=True)
projects.set_index('EmployeeID', inplace=True)

# Joining based on the index
joined_df = employees.join(projects, how='inner', lsuffix='_employee', rsuffix='_project')
print("Joined DataFrame:")
joined_df


Joined DataFrame:


Unnamed: 0_level_0,Name,Project
EmployeeID,Unnamed: 1_level_1,Unnamed: 2_level_1
2,Doyin,ProjectA
3,Temitayo,ProjectB
4,Abdullahi,ProjectC


- **Concatenate**:</br>
Concatenation is used to combine DataFrames along a particular axis.It Combines DataFrames along a specified axis (either rows or columns)

In [33]:
# Concatenating along rows (axis=0)
concatenated_df = pd.concat([employees, projects], ignore_index=True)
print("Concatenated DataFrame:")
concatenated_df


Concatenated DataFrame:


Unnamed: 0,Name,Project
0,Idris,
1,Doyin,
2,Temitayo,
3,Abdullahi,
4,,ProjectA
5,,ProjectB
6,,ProjectC
7,,ProjectD


### Reshaping and Pivoting
Reshaping and pivoting are common operations in pandas that allow you to transform the structure of your data.

Consider a dataset representing the sales of products in different regions and months:

In [34]:
data = {
    'Month': ['Jan', 'Jan', 'Feb', 'Feb', 'Mar', 'Mar'],
    'Region': ['North', 'South', 'North', 'South', 'North', 'South'],
    'Product_A': [100, 120, 150, 130, 110, 140],
    'Product_B': [200, 180, 210, 190, 220, 200],
}

df = pd.DataFrame(data)
print("Original DataFrame:")
df


Original DataFrame:


Unnamed: 0,Month,Region,Product_A,Product_B
0,Jan,North,100,200
1,Jan,South,120,180
2,Feb,North,150,210
3,Feb,South,130,190
4,Mar,North,110,220
5,Mar,South,140,200


**Reshaping with `melt`**</br>
The `melt` function is used to transform wide-format data into long-format. The melt function will transform the original DataFrame from a wide format to a long format, which makesit easier to analyze and work with.

In [35]:
melted_df = pd.melt(df, id_vars=['Month', 'Region'], var_name='Product', value_name='Sales')
print("\nMelted DataFrame:")
melted_df


Melted DataFrame:


Unnamed: 0,Month,Region,Product,Sales
0,Jan,North,Product_A,100
1,Jan,South,Product_A,120
2,Feb,North,Product_A,150
3,Feb,South,Product_A,130
4,Mar,North,Product_A,110
5,Mar,South,Product_A,140
6,Jan,North,Product_B,200
7,Jan,South,Product_B,180
8,Feb,North,Product_B,210
9,Feb,South,Product_B,190


**Pivoting with `pivot_table`:**</br>
This is used for the reverse operation, converting long-format data back to wide-format.

Now, let's pivot the data back into a wide format using the pivot_table function.

In [None]:
pivoted_df = melted_df.pivot_table(index=['Month', 'Region'], columns='Product', values='Sales').reset_index()
print("\nPivoted DataFrame:")
pivoted_df



Pivoted DataFrame:


## Data Cleaning and Handling:
   - Missing Data Handling: Methods like dropna(), fillna() to handle missing values.
   - Duplicates Handling: drop_duplicates() to remove duplicate rows.
   - Data Conversion: Methods like astype() for changing data types.

In [None]:
#recall our dataframe called 'frame' above
frame1 = frame.copy()

### Missing Data Handling

In [None]:
# Handling missing values
frame1.loc[2, 'Salary'] = None
print("\nDataFrame with Missing Value:")
frame1

In [None]:
# Checking for missing values
print("\nMissing Values:")
frame1.isnull()

In [None]:
# Dropping rows with missing values
frame_cleaned = frame1.dropna()
print("\nDataFrame after Dropping Missing Values:")
frame_cleaned

In [None]:
# Filling missing values with a specific value
frame_filled = frame1.fillna(value={'Salary': frame['Salary'].mean()})
print("\nDataFrame after Filling Missing Values:")
frame_filled

### Duplicates Handling

In [None]:
# Checking for duplicates
print("\nDuplicate Rows:")
frame1.duplicated()

In [None]:
# Dropping duplicate rows
frame_no_duplicates = frame1.drop_duplicates()
print("\nDataFrame after Dropping Duplicates:")
frame_no_duplicates


### Data Conversion

In [None]:
# Converting 'Age' column to integers
frame['Age'] = frame1['Age'].astype(int)
print("\nDataFrame after Data Conversion:")
frame

## Statistical and Mathematical Functions:
   - Descriptive Statistics: Methods like mean(), sum(), min(), max(), describe() to calculate statistics.
   - Correlation and Covariance: Functions like corr(), cov() to compute correlation and covariance.


In [None]:
#recall our dataframe called 'frame' above
frame

### Descriptive Statistics

In [None]:
# Calculating mean and median
mean_age = frame['Age'].mean()
median_salary = frame['Salary'].median()
min_salary = frame['Salary'].min()
max_salary = frame['Salary'].max()
description = frame.describe()

In [None]:
print("\nDescriptive Statistics:\n")

print("Mean Age:", mean_age)
print("Median Salary:", median_salary)
print("Minimum Salary:", min_salary)
print("Maximum Salary:", max_salary)
print("\nDescribe dataframe:") 
description

### Correlation and Covariance:
The `corr()` and `cov()` functions are not suitable for string data. In this case, we can exclude non-numeric columns before calculating the correlation and covariance.

In [None]:
#recall our dataframe called 'frame' above
frame

In [None]:
frame['Age']

In [None]:
# Excluding non-numeric columns
numeric_df = frame.select_dtypes(include=['int64', 'float64','int32'])

In [None]:
# Calculating correlation matrix
correlation_matrix = numeric_df.corr()
print("\nCorrelation Matrix:")
correlation_matrix

In [None]:
# Calculating covariance matrix
covariance_matrix = numeric_df.cov()
print("\nCovariance Matrix:")
covariance_matrix

## Time Series and Date Functionality:
   - Date-Time Indexing: Methods to work with time series data using date-time indexing.
   - Time Resampling and Shifting: Functions like resample(), shift() for time-based operations.

### Date-Time Indexing

In [None]:
# Creating a sample DataFrame with time series data
date_rng = pd.date_range(start='2023-01-01', end='2023-01-05', freq='D')
data = {
    'Date': date_rng,
    'Temperature': [32, 35, 28, 24, 30],
    'Humidity': [40, 45, 38, 42, 36]
}

df_time_series = pd.DataFrame(data)
print("Original DataFrame:")
df_time_series

### Time Indexing

In [None]:
# Setting the 'Date' column as the index
df_time_series.set_index('Date', inplace=True)
print("\nDataFrame with Time Index:")
df_time_series

### Resampling Time Series Data

In [None]:
# Resampling to weekly frequency
df_weekly = df_time_series.resample('W').mean()
print("\nResampled DataFrame (Weekly Mean):")
df_weekly

### Shifting Time Series Data

In [None]:
# Shifting the 'Temperature' column by one day
df_time_series['Temperature_Shifted'] = df_time_series['Temperature'].shift(1)
print("\nDataFrame with Shifted Temperature:")
df_time_series


## Plotting and Visualization:
   - Integration with Matplotlib: Seamless integration with Matplotlib for customized plotting.
   - Plotting Methods: Methods like plot(), hist(), boxplot() for data visualization.

### Integration with Matplotlib
Pandas integrates with Matplotlib for plotting and visualization. Here we introduce a new package `matplotlib` for plotting and visualization

In [None]:
# importing the package
import matplotlib.pyplot as plt # Here we import 'pyplot' a module of the matplotlib library 
%matplotlib inline

In [None]:
# Plotting using our time series data
df_time_series['Temperature'].plot(title='Temperature Over Time', xlabel='Date', ylabel='Temperature (°C)')


### Plotting Methods

In [None]:
#recall our dataframe called 'frame' above
frame

### Line Plot

In [None]:
# Plotting a line chart for 'Age' and 'Salary'
frame.plot(x='Name', y=['Age', 'Salary'], kind='line', title='Age and Salary Comparison', marker='o')
print("A line chart comparing the 'Age' and 'Salary' for each person.")

### Bar Plot

In [None]:
# Plotting a bar chart for 'Salary' by department
frame.groupby('Department')['Salary'].sum().plot(kind='bar', title='Total Salary by Department', rot=0)
plt.xlabel('Department')
plt.ylabel('Total Salary')

print("A bar chart showing the total salary for each department.")

### Histogram

In [None]:
# Plotting a histogram for 'Age'
frame['Age'].plot(kind='hist', bins=np.arange(20, 35, 2), title='Age Distribution')
plt.xlabel('Age')
plt.ylabel('Frequency')

print("A histogram showing the distribution of ages")

### Box Plot

In [None]:
# Plotting a box plot for 'Salary'
frame['Salary'].plot(kind='box', title='Salary Distribution')

print("A box plot illustrating the distribution of salaries.")