# Selflearning Pandas

## Quick Overview of Pandas

Pandas is an essential Python library in data science for data manipulation and analysis. It introduces two primary data structures:

1. **DataFrame**: A 2D table, like an Excel spreadsheet. Ideal for handling structured data.
2. **Series**: A 1D array, akin to a single column in a DataFrame.

Pandas excels in:

- **Data Import and Export**: Effortlessly read and write data from various sources.
- **Data Cleaning**: Handling missing data and preparing datasets for analysis.
- **Data Transformation**: Reshaping, merging, and slicing datasets.
- **Analysis**: Providing tools for statistical analysis, grouping, and time-series data.

## Advancing with Pandas

Given your proficiency in Python and basic data analysis, we can move on to more advanced Pandas functionalities. Let's start with some key operations:

1. **Data Exploration**: Methods like `.head()`, `.tail()`, `.describe()`, to quickly explore datasets.
2. **Indexing and Selection**: Accessing specific subsets of data using loc and iloc.
3. **Grouping and Aggregation**: Using `groupby` for aggregating data based on categories.
4. **Pivot Tables**: Reshaping data for in-depth analysis.
5. **Time-Series Analysis**: Handling dates and times, which is vital in many data analysis tasks.

In [3]:
# Series in Pandas
import pandas as pd
import numpy as np

my_array_1D = np.array([100, 200, 300])
index = ['a', 'b', 'c']
my_series = pd.Series(my_array_1D, index)

print("The first element of \"my_series\": ", my_series.iloc[0])
print("The 'c' element of \"my_series\": ", my_series.loc['c'])

The first element of "my_series":  100
The 'c' element of "my_series":  300


In [2]:
# DataFrame in Pandas
my_array_2D = np.array([[1, 2, 3], [2, 3, 4], [3, 4, 5]])

row_index = ['row1', 'row2', 'row3']
col_name = ['col1', 'col2', 'col3']

my_dataframe = pd.DataFrame(my_array_2D, row_index, col_name)

print("my_dataframe: ")
print(my_dataframe)
print("access the data of 'col3': ")
print(my_dataframe['col3'])

my_dataframe: 
      col1  col2  col3
row1     1     2     3
row2     2     3     4
row3     3     4     5
access the data of 'col3': 
row1    3
row2    4
row3    5
Name: col3, dtype: int32
