## Import pandas

In [2]:
import pandas as pd

## Pandas Series

In [14]:
# Import the pandas library to use its functionalities

# 'average_score_in_class' is a list (or another iterable) containing average scores. Replace it with actual data.
average_score_in_class = [90, 82, 88, 94, 78]  # Example data

# Creating a Series from the 'average_score_in_class' data
my_series = pd.Series(average_score_in_class)

# Printing the content of the created Series
print("Content of my_series")
print(my_series)
print("\n")  # Prints a newline for better readability

# Printing the type of the created variable 'my_series' to show it's a pandas Series
print(f"Type of data: {type(my_series)}")

# Explanation:
# - A pandas Series is a one-dimensional array-like object that can hold many data types.
# - Here, 'my_series' is created with numerical data representing average scores.
# - The 'print' statements display the contents of 'my_series' and confirm its data type.


Content of my_series
0    90
1    82
2    88
3    94
4    78
dtype: int64


Type of data: <class 'pandas.core.series.Series'>


## Pandas DataFrame

In [15]:
# Import the pandas library to use its functionalities
import pandas as pd

# Define a dictionary with two keys: 'scores' and 'names'. Each key has a list of values.
# 'scores' contains numerical scores, and 'names' contains strings of names.
my_data = {"scores": [40, 50, 20, 99, 100], "names": ["kelvin", "andy", "josh", "frank", "aaron"]}

# Creating a DataFrame from the dictionary. This structures the data in a tabular format,
# where 'scores' and 'names' become column headers, and their corresponding lists become the column data.
my_dataframe = pd.DataFrame(my_data)

# Printing the content of the created DataFrame to display its structure and data
print("Content of my_dataframe")
print(my_dataframe)
print("\n")  # Prints a newline for better readability

# Printing the type of the created variable 'my_dataframe' to show it's a pandas DataFrame
print(f"Type of data: {type(my_dataframe)}")

# Explanation:
# - A pandas DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).
# - Here, 'my_dataframe' is created from a dictionary, where each key-value pair corresponds to a column in the DataFrame.
# - The 'print' statements display the contents of 'my_dataframe' and confirm its data type, illustrating how to create and inspect a pandas DataFrame.


Content of my_dataframe
   scores   names
0      40  kelvin
1      50    andy
2      20    josh
3      99   frank
4     100   aaron


Type of data: <class 'pandas.core.frame.DataFrame'>


## * Loading datafrom from csv

To load data from a CSV file into a pandas DataFrame, use the read_csv function from pandas. This function reads a comma-separated values (CSV) file and converts it into a DataFrame, allowing you to work with the data in Python.

Steps:
Import pandas: First, import the pandas library.
Use read_csv: Call the read_csv() function with the file path of your CSV file as the argument

In [7]:
df = pd.read_csv("../datasets/Iris.csv")

#### Data series of dataframe

In [8]:
# Data series
my_series = df['PetalLengthCm']

print("Content of data")
print(my_series)
print(f" \n Type of data: {type(my_series)}")

Content of data
0      1.4
1      1.4
2      1.3
3      1.5
4      1.4
      ... 
145    5.2
146    5.0
147    5.2
148    5.4
149    5.1
Name: PetalLengthCm, Length: 150, dtype: float64
 
 Type of data: <class 'pandas.core.series.Series'>


#### Dataframe from dataframe

In [9]:
# DataFrame
my_dataframe = df[['PetalLengthCm','PetalLengthCm']]

print("Content of data")
print(my_dataframe)
print(f" \n Type of data: {type(my_dataframe)}")

Content of data
     PetalLengthCm  PetalLengthCm
0              1.4            1.4
1              1.4            1.4
2              1.3            1.3
3              1.5            1.5
4              1.4            1.4
..             ...            ...
145            5.2            5.2
146            5.0            5.0
147            5.2            5.2
148            5.4            5.4
149            5.1            5.1

[150 rows x 2 columns]
 
 Type of data: <class 'pandas.core.frame.DataFrame'>


### * Analysis on data
#### 1. Getting head and tail of data

What it does: Shows the first few rows of a DataFrame (default is 5).
How to use: Simply call `df.head()` or specify the number of rows `df.head(n)` where n is the number of rows you want to see.
Example: `df.head()` displays the first 5 rows, and `df.head(10)` displays the first 10 rows.

In [10]:
# Getting the first 5 row of dataset
df.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


#### tail of data

What it does: Shows the last few rows of a DataFrame (default is 5).
How to use: Call `df.tail()` or specify the number of rows `df.tail(n)` to see the end of the DataFrame.
Example: `df.tail()` displays the last 5 rows, and `df.tail(10)` displays the last 10 rows.

In [11]:
# Getting last 5 rows of dataset
df.tail()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
145,146,6.7,3.0,5.2,2.3,Iris-virginica
146,147,6.3,2.5,5.0,1.9,Iris-virginica
147,148,6.5,3.0,5.2,2.0,Iris-virginica
148,149,6.2,3.4,5.4,2.3,Iris-virginica
149,150,5.9,3.0,5.1,1.8,Iris-virginica


#### 2. Data Selection and Index 
#### iloc

* What it does: Selects data based on row and column positions (integers).* 
How to use:` df.ilc[row_index, column_index]`.
]
Example: To select data from the first three rows and the first and third columns`: df.iloc[0:3, [0, 2`]].

In [12]:
# df.iloc[row_positions, column_positions]
df.iloc[1]

Id                         2
SepalLengthCm            4.9
SepalWidthCm             3.0
PetalLengthCm            1.4
PetalWidthCm             0.2
Species          Iris-setosa
Name: 1, dtype: object

#### Slcing iloc
* What it does: Selects data based on row and column possitions (integers)  
* How to use: `df.iloc[row_index, column_index]`
Example: To select data from the first three rows and the first and third columns:
`df.iloc[0:3, [0, 2]]`.

In [13]:
# Slicing with iloc
df.iloc[:10]

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
5,6,5.4,3.9,1.7,0.4,Iris-setosa
6,7,4.6,3.4,1.4,0.3,Iris-setosa
7,8,5.0,3.4,1.5,0.2,Iris-setosa
8,9,4.4,2.9,1.4,0.2,Iris-setosa
9,10,4.9,3.1,1.5,0.1,Iris-setosa


#### loc

* What it does: Selects data based on row and column labels.
* How to use: `df.loc[row_label, column_label]`
* Example: To select data from rows labeled '1' to '3' and columns 'PetalLength' and 'Species' `df.loc[1:3, ['['PetalLength', 'Specices']`.

In [27]:
# df.loc[row_labels / row_index, column_labels]
df.loc[0]

Id                         1
SepalLengthCm            5.1
SepalWidthCm             3.5
PetalLengthCm            1.4
PetalWidthCm             0.2
Species          Iris-setosa
Name: 0, dtype: object

#### Slicing with loc
* For rows: `df.loc[start_label:end_label]` includes both start and end labels.* 
For columns:` df.loc[:, 'start_column':'end_column'`] also includes end column
Example` : df.loc[1:3,Species'A'PetalLength'C `'] selects rows 1 to 3 and columnsSpecies'A' toPetalLength'C'.

In [28]:
#  Slicing with df.loc
df.loc[3:10]

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
5,6,5.4,3.9,1.7,0.4,Iris-setosa
6,7,4.6,3.4,1.4,0.3,Iris-setosa
7,8,5.0,3.4,1.5,0.2,Iris-setosa
8,9,4.4,2.9,1.4,0.2,Iris-setosa
9,10,4.9,3.1,1.5,0.1,Iris-setosa
10,11,5.4,3.7,1.5,0.2,Iris-setosa
