In [2]:
import pandas as pd

In [3]:
# Creating a Series from a list
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
print("Series from list:\n", series)


Series from list:
 0    1
1    2
2    3
3    4
4    5
dtype: int64


In [4]:
# Creating a DataFrame from a dictionary
data = {
    'Name': ['John', 'Anna', 'Peter', 'Linda'],
    'Age': [28, 24, 35, 32]
}
df = pd.DataFrame(data)
print("\nDataFrame from dictionary:\n", df)


DataFrame from dictionary:
     Name  Age
0   John   28
1   Anna   24
2  Peter   35
3  Linda   32


Operations on Dataframes:

Selecting data:

In [5]:
ages = df['Age']
print("\nSelected column (Age):\n", ages)

# Selecting multiple columns
subset = df[['Name', 'Age']]
print("\nSubset of DataFrame:\n", subset)


Selected column (Age):
 0    28
1    24
2    35
3    32
Name: Age, dtype: int64

Subset of DataFrame:
     Name  Age
0   John   28
1   Anna   24
2  Peter   35
3  Linda   32


Filtering data:

In [6]:
filtered_df = df[df['Age'] > 30]
print("\nFiltered DataFrame (Age > 30):\n", filtered_df)


Filtered DataFrame (Age > 30):
     Name  Age
2  Peter   35
3  Linda   32


Modifying of data:

In [7]:
df.loc[0, 'Age'] = 29  # Update a single value
print("\nDataFrame after modifying age of the first entry:\n", df)


DataFrame after modifying age of the first entry:
     Name  Age
0   John   29
1   Anna   24
2  Peter   35
3  Linda   32


2. Data Handling with Pandas

Handling missing data:

In [8]:
df_with_missing = pd.DataFrame({
    'A': [1, 2, None, 4],
    'B': [None, 2, 3, 4]
})


Checking missing data

In [9]:
print("\nDataFrame with missing values:\n", df_with_missing)
print("\nChecking for missing data:\n", df_with_missing.isnull())


DataFrame with missing values:
      A    B
0  1.0  NaN
1  2.0  2.0
2  NaN  3.0
3  4.0  4.0

Checking for missing data:
        A      B
0  False   True
1  False  False
2   True  False
3  False  False


Filling specific value

In [10]:
df_filled = df_with_missing.fillna(0)
print("\nDataFrame after filling missing values:\n", df_filled)


DataFrame after filling missing values:
      A    B
0  1.0  0.0
1  2.0  2.0
2  0.0  3.0
3  4.0  4.0


Conversion of data types:

In [11]:
df['Age'] = df['Age'].astype(float)
print("\nDataFrame with Age converted to float:\n", df)


DataFrame with Age converted to float:
     Name   Age
0   John  29.0
1   Anna  24.0
2  Peter  35.0
3  Linda  32.0


3. Data Analysis with Pandas

In [12]:
summary_stats = df.describe()
print("\nSummary statistics of the DataFrame:\n", summary_stats)


Summary statistics of the DataFrame:
              Age
count   4.000000
mean   30.000000
std     4.690416
min    24.000000
25%    27.750000
50%    30.500000
75%    32.750000
max    35.000000


Grouping Data and Applying Aggregate Functions:

In [13]:
grouped = df.groupby('Name').agg({'Age': 'mean'})
print("\nGrouped data with mean age:\n", grouped)


Grouped data with mean age:
         Age
Name       
Anna   24.0
John   29.0
Linda  32.0
Peter  35.0


Merging two dataframes

In [14]:
df1 = pd.DataFrame({'Key': ['A', 'B', 'C'], 'Value': [1, 2, 3]})
df2 = pd.DataFrame({'Key': ['A', 'B', 'D'], 'Value': [4, 5, 6]})
merged_df = pd.merge(df1, df2, on='Key', how='outer')
print("\nMerged DataFrame:\n", merged_df)


Merged DataFrame:
   Key  Value_x  Value_y
0   A      1.0      4.0
1   B      2.0      5.0
2   C      3.0      NaN
3   D      NaN      6.0


Concatenation

In [15]:
concatenated_df = pd.concat([df1, df2], ignore_index=True)
print("\nConcatenated DataFrame:\n", concatenated_df)


Concatenated DataFrame:
   Key  Value
0   A      1
1   B      2
2   C      3
3   A      4
4   B      5
5   D      6


4.Application in Data Science

Advantages of Using Pandas:

Efficiency: Pandas is built on top of NumPy and provides high-performance, easy-to-use data structures and data analysis tools.

Data Manipulation: It allows quick and easy data manipulation, cleaning, and transformation.

Versatility: It supports multiple file formats like CSV, Excel, SQL, and JSON, making it versatile for various data science tasks.

Integration: It integrates well with other Python libraries like Matplotlib, Seaborn for visualization, and Scikit-learn for machine learning.

Real-World Examples:

Data Cleaning: Pandas is often used to clean messy datasets by handling missing values, removing duplicates, and standardizing data formats.

Exploratory Data Analysis (EDA): It helps in quickly generating summary statistics, visualizing data distributions, and uncovering patterns in the data.

Financial Analysis: Used for time series data analysis and manipulation, which is common in financial datasets.

Conclusion

Using Pandas in data science provides a powerful toolset for data handling and analysis.
Its ability to handle large datasets efficiently and its integration with other Python libraries make it indispensable for data science professionals.

You can now use these examples and explanations to get hands-on experience with Pandas and understand its crucial role in data science