# Microsoft AI School - Pandas Library
* Date: 12/27/2024
* Python Version: 3.13.1
* Notes: Pandas Version: 2.2.3

# 1. Pandas
* Python Data Analysis Library
* A library for data manipulation and analysis.
* Provides a wide range of data analysis functions.
* Allows creation and manipulation of data objects structured in rows and columns.

In [1]:
pip install pandas

Collecting pandas
  Downloading pandas-2.2.3-cp313-cp313-win_amd64.whl.metadata (19 kB)
Collecting pytz>=2020.1 (from pandas)
  Downloading pytz-2024.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas)
  Downloading tzdata-2024.2-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pandas-2.2.3-cp313-cp313-win_amd64.whl (11.5 MB)
   ---------------------------------------- 0.0/11.5 MB ? eta -:--:--
   ------- -------------------------------- 2.1/11.5 MB 11.8 MB/s eta 0:00:01
   ----------- ---------------------------- 3.4/11.5 MB 8.1 MB/s eta 0:00:01
   ------------------ --------------------- 5.2/11.5 MB 8.4 MB/s eta 0:00:01
   -------------------------- ------------- 7.6/11.5 MB 9.2 MB/s eta 0:00:01
   ----------------------------- ---------- 8.4/11.5 MB 9.4 MB/s eta 0:00:01
   ------------------------------- -------- 9.2/11.5 MB 7.2 MB/s eta 0:00:01
   ------------------------------------ --- 10.5/11.5 MB 7.4 MB/s eta 0:00:01
   ----------------------------

In [4]:
pd.__version__

'2.2.3'

# 2. Data Structures
* There are two main structures:
    * Series
    * DataFrame

## A. Series
* A one-dimensional array
* Contains data of a single type
* The index can be specified as strings; if not specified, it starts with numeric indices from 0.

In [2]:
import pandas as pd

In [6]:
s = pd.Series([3, -5, 7, 4])
print(s)
print(type(s))

0    3
1   -5
2    7
3    4
dtype: int64
<class 'pandas.core.series.Series'>


In [9]:
s = pd.Series([3, -5, 7, 4], index = ["a", "b", "c", "d"])
print(s)
print(s.index)
print(s.values)

a    3
b   -5
c    7
d    4
dtype: int64
Index(['a', 'b', 'c', 'd'], dtype='object')
[ 3 -5  7  4]


In [11]:
# Similar element-wise artihmetic operations like NumPy
print(s * 3)

a     9
b   -15
c    21
d    12
dtype: int64


## B. DataFrame
* A two-dimensional array (similar to a spreadsheet).
* Data is stored in a table-like format, with two main axes: index and columns.
* A DataFrame is essentially a collection of series combined together.

### I. Creating a DaataFrame
* In Pandas, DataFrame an be generated from a list of lists(2D list) or a dictionary.

In [13]:
df = pd.DataFrame({"State": ["Pennsylvania", "New York", "California"], 
                   "Area Code": [267, 212, 323], 
                   "Population (in millions)": [12.96, 8.26, 38.97]})
print(type(df))
df

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,State,Area Code,Population (in millions)
0,Pennsylvania,267,12.96
1,New York,212,8.26
2,California,323,38.97


In [14]:
df.index

RangeIndex(start=0, stop=3, step=1)

In [15]:
df.columns

Index(['State', 'Area Code', 'Population (in millions)'], dtype='object')

In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   State                     3 non-null      object 
 1   Area Code                 3 non-null      int64  
 2   Population (in millions)  3 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 204.0+ bytes


In [17]:
df.set_index("State")

Unnamed: 0_level_0,Area Code,Population (in millions)
State,Unnamed: 1_level_1,Unnamed: 2_level_1
Pennsylvania,267,12.96
New York,212,8.26
California,323,38.97
