## INTRO

**Script:** "Hi everyone, welcome back! If you've been following along, you already know how to set up your Python environment, organize your project files, and manage virtual environments using Anaconda. If not, be sure to check out those videos first. Today, we're diving into Python Pandas, an essential library for data manipulation and analysis. Whether you're just starting out or looking to refine your skills, this tutorial will give you a solid foundation in Pandas."

**Visual:** Show a brief intro slide with mentions of previous videos and the Pandas logo.

### [SECTION 1: Installing Pandas]
**Script:** "Before we begin, make sure Pandas is installed in your virtual environment. If you're using Anaconda, Pandas comes pre-installed, but you can still use pip to install it if needed."

***Code:*** `pip install pandas`

***Visual:*** Show the command running in a terminal.

### [SECTION 2: Importing Pandas]
**Script:** "Once Pandas is installed, you need to import it into your Python script or notebook. Here's how you do it."

In [1]:
import pandas as pd

### [SECTION 3: Loading Data]
**Script:** "Now, let's load some data. Pandas can read data from various formats like CSV, Excel, SQL, and more. In this example, we'll load a CSV file."

In [2]:
# Load data from a CSV file
df = pd.read_csv('../Dataset/heart copy.csv')

# Display the first few rows of the dataframe
print(df.head())

   age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \
0   52    1   0       125   212    0        1      NaN      0      1.0      2   
1   53    1   0       140   203    1        0    155.0      1      3.1      0   
2   70    1   0       145   174    0        1    125.0      1      2.6      0   
3   61    1   0       148   203    0        1    161.0      0      0.0      2   
4   62    0   0       138   294    1        1    106.0      0      1.9      1   

   ca  thal  target  
0   2     3       0  
1   0     3       0  
2   0     3       0  
3   1     3       0  
4   3     2       0  


### [SECTION 4: Exploring the Data]
**Script:** "After loading the data, you might want to explore it a bit. Pandas provides several functions to understand your data better."

In [3]:
# Display the shape of the dataframe
print(df.shape)

(1025, 14)


In [4]:
# Get a summary of the dataframe
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1025 entries, 0 to 1024
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1025 non-null   int64  
 1   sex       1025 non-null   int64  
 2   cp        1025 non-null   int64  
 3   trestbps  1025 non-null   int64  
 4   chol      1025 non-null   int64  
 5   fbs       1025 non-null   int64  
 6   restecg   1025 non-null   int64  
 7   thalach   1019 non-null   float64
 8   exang     1025 non-null   int64  
 9   oldpeak   1025 non-null   float64
 10  slope     1025 non-null   int64  
 11  ca        1025 non-null   int64  
 12  thal      1025 non-null   int64  
 13  target    1025 non-null   int64  
dtypes: float64(2), int64(12)
memory usage: 112.2 KB
None


In [5]:
# Display basic statistics of the numerical columns
print(df.describe())

               age          sex           cp     trestbps        chol  \
count  1025.000000  1025.000000  1025.000000  1025.000000  1025.00000   
mean     54.434146     0.695610     0.942439   131.611707   246.00000   
std       9.072290     0.460373     1.029641    17.516718    51.59251   
min      29.000000     0.000000     0.000000    94.000000   126.00000   
25%      48.000000     0.000000     0.000000   120.000000   211.00000   
50%      56.000000     1.000000     1.000000   130.000000   240.00000   
75%      61.000000     1.000000     2.000000   140.000000   275.00000   
max      77.000000     1.000000     3.000000   200.000000   564.00000   

               fbs      restecg      thalach        exang      oldpeak  \
count  1025.000000  1025.000000  1019.000000  1025.000000  1025.000000   
mean      0.149268     0.529756   149.089303     0.336585     1.071512   
std       0.356527     0.527878    23.025161     0.472772     1.175053   
min       0.000000     0.000000    71.000000  

### [SECTION 5: Selecting Data]
**Script:** "Pandas makes it easy to select data from your DataFrame. You can select columns, rows, or specific values using different methods."

In [6]:
# Select a single column
print(df['age'])

0       52
1       53
2       70
3       61
4       62
        ..
1020    59
1021    60
1022    47
1023    50
1024    54
Name: age, Length: 1025, dtype: int64


In [7]:
# Select multiple columns
print(df[['age', 'sex']])

      age  sex
0      52    1
1      53    1
2      70    1
3      61    1
4      62    0
...   ...  ...
1020   59    1
1021   60    1
1022   47    1
1023   50    0
1024   54    1

[1025 rows x 2 columns]


In [8]:
# Select rows by index
print(df.iloc[0:5])

   age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \
0   52    1   0       125   212    0        1      NaN      0      1.0      2   
1   53    1   0       140   203    1        0    155.0      1      3.1      0   
2   70    1   0       145   174    0        1    125.0      1      2.6      0   
3   61    1   0       148   203    0        1    161.0      0      0.0      2   
4   62    0   0       138   294    1        1    106.0      0      1.9      1   

   ca  thal  target  
0   2     3       0  
1   0     3       0  
2   0     3       0  
3   1     3       0  
4   3     2       0  


### [SECTION 6: Data Manipulation]
**Script:** "Pandas also offers powerful data manipulation features. You can add new columns, drop unnecessary ones, handle missing data, and more."

In [9]:
# Add a new column
df['demo_01'] = df['age'] + df['sex']

In [10]:
# Drop a column
df = df.drop('cp', axis=1)

In [11]:
df.head(10)

Unnamed: 0,age,sex,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target,demo_01
0,52,1,125,212,0,1,,0,1.0,2,2,3,0,53
1,53,1,140,203,1,0,155.0,1,3.1,0,0,3,0,54
2,70,1,145,174,0,1,125.0,1,2.6,0,0,3,0,71
3,61,1,148,203,0,1,161.0,0,0.0,2,1,3,0,62
4,62,0,138,294,1,1,106.0,0,1.9,1,3,2,0,62
5,58,0,100,248,0,0,122.0,0,1.0,1,0,2,1,58
6,58,1,114,318,0,2,140.0,0,4.4,0,3,1,0,59
7,55,1,160,289,0,0,145.0,1,0.8,1,1,3,0,56
8,46,1,120,249,0,0,144.0,0,0.8,2,0,3,0,47
9,54,1,122,286,0,0,116.0,1,3.2,1,2,2,0,55


In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1025 entries, 0 to 1024
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1025 non-null   int64  
 1   sex       1025 non-null   int64  
 2   trestbps  1025 non-null   int64  
 3   chol      1025 non-null   int64  
 4   fbs       1025 non-null   int64  
 5   restecg   1025 non-null   int64  
 6   thalach   1019 non-null   float64
 7   exang     1025 non-null   int64  
 8   oldpeak   1025 non-null   float64
 9   slope     1025 non-null   int64  
 10  ca        1025 non-null   int64  
 11  thal      1025 non-null   int64  
 12  target    1025 non-null   int64  
 13  demo_01   1025 non-null   int64  
dtypes: float64(2), int64(12)
memory usage: 112.2 KB


In [13]:
# Fill missing values
# df['thalach'] = df['thalach'].fillna(0) # # Replace missing values with 0

# df['thalach'] = df['thalach'].fillna(df['thalach'].mean())  # Replace missing values with the mean

# df['thalach'] = df['thalach'].fillna(df['thalach'].median())  # Replace missing values with the median

# df['thalach'] = df['thalach'].fillna(df['thalach'].mode()[0])  # Replace missing values with the mode


In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1025 entries, 0 to 1024
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1025 non-null   int64  
 1   sex       1025 non-null   int64  
 2   trestbps  1025 non-null   int64  
 3   chol      1025 non-null   int64  
 4   fbs       1025 non-null   int64  
 5   restecg   1025 non-null   int64  
 6   thalach   1019 non-null   float64
 7   exang     1025 non-null   int64  
 8   oldpeak   1025 non-null   float64
 9   slope     1025 non-null   int64  
 10  ca        1025 non-null   int64  
 11  thal      1025 non-null   int64  
 12  target    1025 non-null   int64  
 13  demo_01   1025 non-null   int64  
dtypes: float64(2), int64(12)
memory usage: 112.2 KB


In [15]:
# Remove rows with missing values
df = df.dropna()

In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1019 entries, 1 to 1024
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       1019 non-null   int64  
 1   sex       1019 non-null   int64  
 2   trestbps  1019 non-null   int64  
 3   chol      1019 non-null   int64  
 4   fbs       1019 non-null   int64  
 5   restecg   1019 non-null   int64  
 6   thalach   1019 non-null   float64
 7   exang     1019 non-null   int64  
 8   oldpeak   1019 non-null   float64
 9   slope     1019 non-null   int64  
 10  ca        1019 non-null   int64  
 11  thal      1019 non-null   int64  
 12  target    1019 non-null   int64  
 13  demo_01   1019 non-null   int64  
dtypes: float64(2), int64(12)
memory usage: 119.4 KB


### [SECTION 8: Saving Data]
**Script:** "Once you're done with your data analysis or manipulation, you might want to save your work. Pandas makes it easy to save your DataFrame to a file."

In [17]:
# Save the dataframe to a new CSV file
df.to_csv('../New Dataset/new_dataset.csv', index=False)

# Save the dataframe to an Excel file
df.to_excel('../New Dataset/new_dataset.xlsx', index=False)


## [OUTRO]
**Script:** "That wraps up our introduction to Pandas! We've covered how to load, explore, manipulate, and save data using Pandas. If you missed the previous videos on setting up your environment, be sure to check them out. I hope this tutorial helps you get started with your data analysis projects. Don't forget to like, share, and subscribe for more content. Thanks for watching!"