### INTRODUCTION TO PANDAS

#### Introduction to NumPy

NumPy is the foundational Python library for numerical computing.  
It provides efficient array operations, mathematical functions, and tools for linear algebra and random number generation.
Pandas builds on NumPy to provide labeled, table-like data structures for analysis.


In [1]:
# arrays in numpy

#### What is Pandas

Pandas is a Python library for data manipulation and analysis.  
It provides two core data structures:
- **Series**: one-dimensional labeled array  
- **DataFrame**: two-dimensional labeled table

Pandas is typically imported using the alias `pd`.  
Most workflows also import NumPy (`np`) since Pandas is built on top of it.


In [None]:
# imports

#### Creating a DataFrame.

A **DataFrame** is a collection of Series objects sharing the same index.  
It represents tabular data with rows and columns.

In [2]:
# create a pandas dataframe


# Reading and Writing Data
Pandas supports multiple file formats:
- `read_csv()`, `to_csv()`  
- `read_excel()`, `to_excel()` 
- SQL read/write interfaces

#### Viewing Data.

Basic inspection functions include:
- `head()` and `tail()` to preview rows  
- `info()` to display structure and data types  
These help understand dataset shape and completeness.


In [None]:
# code ...

#### Accessing Columns and Rows

Data can be accessed using:
- Column names (`df['col']`)  
- Row indices (`df.loc[]`, `df.iloc[]`)  
`loc` is label-based, while `iloc` is integer-based.


In [3]:
# code

#### Indexing and Slicing.

Subsets of data can be selected using:
- Slicing syntax (`df[0:5]`)  
- Conditional filters (`df[df['col'] > 10]`)  
This enables focused data exploration.


In [None]:
# code

### Data Types and Conversion
Pandas infers column data types automatically.
  
Use:
- `df.dtypes` to view types  
- `astype()` to convert between types

#### handling Missing Data
Missing values appear as `NaN`.  
Useful functions:
- `isna()` to detect  
- `fillna()` to replace  
- `dropna()` to remove


#### Descriptive Statistics
Use `describe()` for a quick summary of numeric data.  
Other methods include:
- `mean()`, `median()`, `std()` for central tendency  
- `corr()` for relationships between columns


#### Filtering Data

Use boolean indexing to filter rows.  
Example concept: select rows where a column meets a condition, similar to a SQL WHERE clause.


#### Adding and Modifying Columns
New columns can be created or updated using:
- Arithmetic operations between columns  
- Assignments with new data or expressions


#### Grouping and Aggregation.


`groupby()` allows summarizing data by one or more keys.  
Common aggregations include:
- `sum()`
- `mean()`
- `count()`
