Here's a comprehensive Jupyter notebook designed to help you get familiar with Pandas, handle and analyze data, and understand its application in data science. The notebook is structured according to your instructions, with sections on getting started with Pandas, data handling, data analysis, and the role of Pandas in data science.

 **Introduction to Pandas and Data Handling in Python**

1.**Getting Familiar with Pandas**

1.1 **Introduction to Pandas**

Pandas is a powerful library for data manipulation and analysis in Python. It provides two primary data structures:
- **Series**: A one-dimensional labeled array capable of holding any data type.
- **DataFrame**: A two-dimensional labeled data structure with columns of potentially different types.


1.2 **Creating Series and DataFrames**

In [12]:
import pandas as pd

**Creating a Series**

In [13]:
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
print("Series:\n", series)

Series:
 0    10
1    20
2    30
3    40
4    50
dtype: int64


**Creating a DataFrame from a dictionary**

In [14]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 27, 22, 32],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}
df = pd.DataFrame(data)
print("\nDataFrame:\n", df)


DataFrame:
       Name  Age         City
0    Alice   24     New York
1      Bob   27  Los Angeles
2  Charlie   22      Chicago
3    David   32      Houston


1.3 **Basic Operations**

In [15]:
# Selecting a column
print("\nSelect 'Name' column:\n", df['Name'])

# Filtering rows
print("\nFilter rows where Age > 25:\n", df[df['Age'] > 25])

# Modifying data
df['Age'] = df['Age'] + 1
print("\nDataFrame with modified Age:\n", df)


Select 'Name' column:
 0      Alice
1        Bob
2    Charlie
3      David
Name: Name, dtype: object

Filter rows where Age > 25:
     Name  Age         City
1    Bob   27  Los Angeles
3  David   32      Houston

DataFrame with modified Age:
       Name  Age         City
0    Alice   25     New York
1      Bob   28  Los Angeles
2  Charlie   23      Chicago
3    David   33      Houston


2. **Data Handling with Pandas**

2.1 **Reading Data from Files**

In [16]:
# Reading a CSV file
df = pd.read_csv('Toy-Sales-dataset.csv')
print("Data from CSV:\n", df.head())

Data from CSV:
    Month  Sales  PromExp  Price  AdExp
0      1  73959    61.13   8.75  50.04
1      2  71544    60.19   8.99  50.74
2      3  78587    59.16   7.50  50.14
3      4  80364    60.38   7.25  50.27
4      5  78771    59.71   7.40  51.25


2.2 **Handling Missing Data**

In [17]:
# Handling missing values
df.ffill()  # Forward fill
print("\nDataFrame with missing values handled:\n", df)

# Removing duplicates
df = df.drop_duplicates()
print("\nDataFrame with duplicates removed:\n", df)



DataFrame with missing values handled:
     Month  Sales  PromExp  Price  AdExp
0       1  73959    61.13   8.75  50.04
1       2  71544    60.19   8.99  50.74
2       3  78587    59.16   7.50  50.14
3       4  80364    60.38   7.25  50.27
4       5  78771    59.71   7.40  51.25
5       6  71986    59.88   8.50  50.65
6       7  74885    60.14   8.40  50.87
7       8  73345    60.08   7.90  50.15
8       9  76659    59.90   7.25  48.24
9      10  71880    59.68   8.70  50.19
10     11  73598    59.83   8.40  51.11
11     12  74893    59.77   8.10  51.49
12     13  69003    59.29   8.40  50.10
13     14  78542    60.40   7.40  49.24
14     15  72543    59.89   8.00  50.04
15     16  74247    60.06   8.30  49.46
16     17  76253    60.51   8.10  51.62
17     18  72582    58.93   8.20  49.78
18     19  69022    60.09   8.99  48.60
19     20  76200    61.00   7.99  49.00
20     21  69701    59.00   8.50  48.00
21     22  77005    59.50   7.90  54.00
22     23  70987    58.00   7.99  48.70

2.3 **Data Type Conversions**

In [18]:
# Converting data types
df['Price'] = df['Price'].astype(float)
print("\nDataFrame with Price as float:\n", df.dtypes)



DataFrame with Price as float:
 Month        int64
Sales        int64
PromExp    float64
Price      float64
AdExp      float64
dtype: object


3. **Data Analysis with Pandas**

3.1 **Generating Summary Statistics**

In [19]:
# Summary statistics
print("\nSummary statistics:\n", df.describe())



Summary statistics:
            Month         Sales    PromExp      Price      AdExp
count  24.000000     24.000000  24.000000  24.000000  24.000000
mean   12.500000  74258.291667  59.875833   8.131667  50.153333
std     7.071068   3164.394612   0.682215   0.506666   1.279119
min     1.000000  69003.000000  58.000000   7.250000  48.000000
25%     6.750000  71959.500000  59.635000   7.900000  49.405000
50%    12.500000  74103.000000  59.895000   8.150000  50.120000
75%    18.250000  76354.500000  60.237500   8.425000  50.772500
max    24.000000  80364.000000  61.130000   8.990000  54.000000


3.2 **Grouping and Aggregating Data**

In [20]:
# Grouping and aggregating
grouped = df.groupby('Sales').mean()
print("\nGrouped data by 'Sales':\n", grouped)



Grouped data by 'Sales':
        Month  PromExp  Price  AdExp
Sales                              
69003   13.0    59.29   8.40  50.10
69022   19.0    60.09   8.99  48.60
69701   21.0    59.00   8.50  48.00
70987   23.0    58.00   7.99  48.70
71544    2.0    60.19   8.99  50.74
71880   10.0    59.68   8.70  50.19
71986    6.0    59.88   8.50  50.65
72543   15.0    59.89   8.00  50.04
72582   18.0    58.93   8.20  49.78
73345    8.0    60.08   7.90  50.15
73598   11.0    59.83   8.40  51.11
73959    1.0    61.13   8.75  50.04
74247   16.0    60.06   8.30  49.46
74885    7.0    60.14   8.40  50.87
74893   12.0    59.77   8.10  51.49
75643   24.0    60.50   8.25  50.00
76200   20.0    61.00   7.99  49.00
76253   17.0    60.51   8.10  51.62
76659    9.0    59.90   7.25  48.24
77005   22.0    59.50   7.90  54.00
78542   14.0    60.40   7.40  49.24
78587    3.0    59.16   7.50  50.14
78771    5.0    59.71   7.40  51.25
80364    4.0    60.38   7.25  50.27


3.3 **Merging, Joining, and Concatenating**

In [21]:
# Merging DataFrames
df1 = pd.DataFrame({'ID': [1, 2], 'Value': [100, 200]})
df2 = pd.DataFrame({'ID': [1, 2], 'Value': [300, 400]})
merged_df = pd.merge(df1, df2, on='ID', suffixes=('_left', '_right'))
print("\nMerged DataFrame:\n", merged_df)



Merged DataFrame:
    ID  Value_left  Value_right
0   1         100          300
1   2         200          400


4. **Application in Data Science**

4.1 **Advantages of Pandas**
Ease of Use: Pandas provides a simple and intuitive interface for working with data.
Efficiency: Operations on DataFrames and Series are optimized for performance.
Flexibility: Supports various data formats and sources.

4.2 **Real-World Examples**
Data Cleaning: Handling missing values, filtering data, and removing duplicates are common tasks in data preprocessing.
Exploratory Data Analysis (EDA): Generating summary statistics and visualizing data helps in understanding trends and patterns.

Pandas is essential in data science for its ability to efficiently handle large datasets and perform complex data manipulations with ease.