Pandas is a powerful and flexible open-source data analysis and manipulation library for Python. It is widely used in data science, machine learning, and big data analytics for its ability to easily handle and manipulate numerical tables and time series data. Here are some key features of pandas:

1) Data Structures: Pandas provides two primary data structures: Series and DataFrame. A Series is a one-dimensional labeled array, while a DataFrame is a two-dimensional labeled data structure with columns of potentially different types.

2) Data Alignment and Handling Missing Data: Pandas automatically aligns data in computations and can handle missing data efficiently.

3) Label-Based Slicing and Indexing: You can select data by labels using methods like .loc[], and by integer position using .iloc[].

4) Group By Functionality: Allows for splitting the data into groups based on some criteria, applying a function to each group independently, and combining the results.

5) Integration with Other Libraries: Works well with other data science libraries like NumPy, Matplotlib, and Scikit-learn.

In [1]:
# If pandas is not already installed, you can install it using pip
!pip install pandas



In [2]:
#import pandas
import pandas as pd


In [3]:
# Creating a Series from a list
s = pd.Series([1, 3, 5, 7, 9])
print("Series:")
print(s)


Series:
0    1
1    3
2    5
3    7
4    9
dtype: int64


In [4]:
# Creating a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 27, 22, 32],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}
df = pd.DataFrame(data)
print("\nDataFrame:")
print(df)


DataFrame:
      Name  Age         City
0    Alice   24     New York
1      Bob   27  Los Angeles
2  Charlie   22      Chicago
3    David   32      Houston


# Data Alignment and Handling Missing Data:

In [5]:
# Creating a DataFrame with missing data
data_with_nan = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [24, 27, None, 32],
    'City': ['New York', 'Los Angeles', None, 'Houston']
}
df_with_nan = pd.DataFrame(data_with_nan)
print("\nDataFrame with Missing Data:")
print(df_with_nan)

# Filling missing data
df_filled = df_with_nan.fillna('Unknown')
print("\nDataFrame with Filled Missing Data:")
print(df_filled)


DataFrame with Missing Data:
      Name   Age         City
0    Alice  24.0     New York
1      Bob  27.0  Los Angeles
2  Charlie   NaN         None
3    David  32.0      Houston

DataFrame with Filled Missing Data:
      Name      Age         City
0    Alice     24.0     New York
1      Bob     27.0  Los Angeles
2  Charlie  Unknown      Unknown
3    David     32.0      Houston


# Label-Based Slicing and Indexing:

In [6]:
# Selecting data by label
print("\nSelecting by label:")
print(df.loc[0:2, ['Name', 'City']])

# Selecting data by integer position
print("\nSelecting by integer position:")
print(df.iloc[1:3, 0:2])


Selecting by label:
      Name         City
0    Alice     New York
1      Bob  Los Angeles
2  Charlie      Chicago

Selecting by integer position:
      Name  Age
1      Bob   27
2  Charlie   22
