# Pandas
pandas is a powerful and open-source Python library specifically designed for data manipulation and analysis. It provides a rich set of tools and data structures that make working with tabular data (think spreadsheets) in Python efficient and intuitive.

**Here's a breakdown of what Pandas offers:**

**Data Structures:**

1. **DataFrames:** The core data structure in Pandas. Imagine a DataFrame as a highly efficient spreadsheet within Python. It stores data in a tabular format with rows and columns, allowing you to organize and manage your data effectively.
2. **Series:** One-dimensional arrays similar to Python lists. Series are often used for focused analysis on specific data elements or to represent a single column extracted from a DataFrame.

![image.png](attachment:image.png)

**Data Manipulation Capabilities:**

1. **Importing Data:** Pandas provides functionalities to import data from various sources like CSV files, Excel spreadsheets, and even databases, making it easy to work with your existing data.
2. **Cleaning and Tidying**: Missing values, inconsistencies, and duplicate data can plague datasets. Pandas offers tools to clean, filter, and transform your data into a well-organized and usable format.
4. **Selection and Indexing**: Precisely select specific rows, columns, or subsets of data based on conditions or criteria. Pandas' indexing capabilities empower you to navigate and manipulate data with ease.
5. **Data Transformation:** Reshape your data, handle duplicates, merge and join datasets from different sources – Pandas allows you to prepare your data for further analysis and modeling tasks.
6. **Data Aggregation and Summarization**: Extract valuable insights from your data. Pandas provides functionalities to calculate statistics like mean, median, standard deviation, and more, helping you summarize and understand the underlying trends within your datasets.

In [None]:
pip install pandas

1. Importing Pandas:

In [2]:
import pandas as pd


2. Creating DataFrames:

Think of a DataFrame as a powerful and efficient spreadsheet within Python. It stores data in a tabular format with rows and columns. Here's how to create a DataFrame:

In [4]:

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 22, 38],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Seattle']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Print the DataFrame
print(df)

df.head()

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   22      Chicago
3    David   38      Seattle


Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,22,Chicago
3,David,38,Seattle


3. Exploring Series:

A Series in Pandas is a one-dimensional array, similar to a list in Python. It's often used to represent a single column from a DataFrame or for focused analysis on specific data elements. Here's how to create a Series:

In [5]:
# Create a Series from a list

ser = pd.Series([25, 30, 22, 38])

# Print the Series
print(ser)


0    25
1    30
2    22
3    38
dtype: int64


4. Accessing Data in DataFrames:

  **Accessing by Column Name:** Use square brackets **[]** to access a specific column by its name.

In [8]:
# Access the 'Name' column
names = df['City']
print(names)


0       New York
1    Los Angeles
2        Chicago
3        Seattle
Name: City, dtype: object


  **Accessing by Row Index:** Use the **.iloc[]** method with integer indexing (starts from 0) to access a specific row.

In [12]:
rec=df.iloc[2]
rec

Name    Charlie
Age          22
City    Chicago
Name: 2, dtype: object