In [2]:
import pandas as pd

In [None]:
# Creating a Series from a list
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
print("Series from list:\n", series)


In [None]:
# Creating a DataFrame from a dictionary
data = {
    'Name': ['John', 'Anna', 'Peter', 'Linda'],
    'Age': [28, 24, 35, 32]
}
df = pd.DataFrame(data)
print("\nDataFrame from dictionary:\n", df)

Operations on Dataframes:

Selecting data:

In [None]:
ages = df['Age']
print("\nSelected column (Age):\n", ages)

# Selecting multiple columns
subset = df[['Name', 'Age']]
print("\nSubset of DataFrame:\n", subset)

Filtering data:

In [None]:
filtered_df = df[df['Age'] > 30]
print("\nFiltered DataFrame (Age > 30):\n", filtered_df)

Modifying of data:

In [None]:
df.loc[0, 'Age'] = 29  # Update a single value
print("\nDataFrame after modifying age of the first entry:\n", df)

2. Data Handling with Pandas

Handling missing data:

In [None]:
df_with_missing = pd.DataFrame({
    'A': [1, 2, None, 4],
    'B': [None, 2, 3, 4]
})


Checking missing data

In [None]:
print("\nDataFrame with missing values:\n", df_with_missing)
print("\nChecking for missing data:\n", df_with_missing.isnull())

Filling specific value

In [None]:
df_filled = df_with_missing.fillna(0)
print("\nDataFrame after filling missing values:\n", df_filled)

Conversion of data types:

In [None]:
df['Age'] = df['Age'].astype(float)
print("\nDataFrame with Age converted to float:\n", df)

3. Data Analysis with Pandas

In [None]:
summary_stats = df.describe()
print("\nSummary statistics of the DataFrame:\n", summary_stats)

Grouping Data and Applying Aggregate Functions:

In [None]:
grouped = df.groupby('Name').agg({'Age': 'mean'})
print("\nGrouped data with mean age:\n", grouped)

Merging two dataframes

In [None]:
df1 = pd.DataFrame({'Key': ['A', 'B', 'C'], 'Value': [1, 2, 3]})
df2 = pd.DataFrame({'Key': ['A', 'B', 'D'], 'Value': [4, 5, 6]})
merged_df = pd.merge(df1, df2, on='Key', how='outer')
print("\nMerged DataFrame:\n", merged_df)

Concatenation

In [None]:
concatenated_df = pd.concat([df1, df2], ignore_index=True)
print("\nConcatenated DataFrame:\n", concatenated_df)

4.Application in Data Science

Advantages of Using Pandas:

Efficiency: Pandas is built on top of NumPy and provides high-performance, easy-to-use data structures and data analysis tools.

Data Manipulation: It allows quick and easy data manipulation, cleaning, and transformation.

Versatility: It supports multiple file formats like CSV, Excel, SQL, and JSON, making it versatile for various data science tasks.

Integration: It integrates well with other Python libraries like Matplotlib, Seaborn for visualization, and Scikit-learn for machine learning.

Real-World Examples:

Data Cleaning: Pandas is often used to clean messy datasets by handling missing values, removing duplicates, and standardizing data formats.

Exploratory Data Analysis (EDA): It helps in quickly generating summary statistics, visualizing data distributions, and uncovering patterns in the data.

Financial Analysis: Used for time series data analysis and manipulation, which is common in financial datasets.

Conclusion

Using Pandas in data science provides a powerful toolset for data handling and analysis.
Its ability to handle large datasets efficiently and its integration with other Python libraries make it indispensable for data science professionals.

You can now use these examples and explanations to get hands-on experience with Pandas and understand its crucial role in data science