# Intro

### What is Python?

- Interpreted: runs directly without compilation
- Interactive: supports direct execution and experimentation
- Object-oriented: based on the concept of classes and objects


### Key Features

Incorporates:

- Modules
- Exceptions
- Dynamic typing
- High-level data types
- Classes

Supports multiple paradigms:
- Object-oriented
- Procedural
- Functional programming

### Why Python Stands Out

- Clear and readable syntax
- Portability

Runs on many operating systems:

- Unix / Linux
- macOS
- Windows

Same code can often run with no modification across platforms


# Panda Section

This section will showcase the basics of pandas, focusing on:
- **Series**: one-dimensional labeled arrays
- **DataFrame**: two-dimensional tabular data

We'll explore creation, indexing, and operations.

In [1]:
import pandas as pd

## 1. Series
### 1.1 Creating a Series

In [2]:
s1 = pd.Series([1, 3, 5, 0, 6, 8])
s2 = pd.Series({'a': 10, 'b': 20, 'c': 30})
s3 = pd.Series(data=[9, 4, 2, 0, 1, 8], index= ['a', 'b', 'c', 'd', 'e', 'f'])
print(s1)
print()
print(s2)
print()
print(s3)

0    1
1    3
2    5
3    0
4    6
5    8
dtype: int64

a    10
b    20
c    30
dtype: int64

a    9
b    4
c    2
d    0
e    1
f    8
dtype: int64


### 1.2 Accessing data

In [3]:
print('Access by custom index:', s2['b'])
print('Access by index:', s2.iloc[1])
print('Slicing first 2 rows:\n'+str(s2[:2]))

Access by custom index: 20
Access by index: 20
Slicing first 2 rows:
a    10
b    20
dtype: int64


### 1.3 Operations

In [4]:
s3 * 2

a    18
b     8
c     4
d     0
e     2
f    16
dtype: int64

## 2. DataFrame

In [5]:
import pandas as pd

### 2.1 Creating a DataFrame

In [6]:
df = pd.read_csv('adult.csv')

### 2.2 Inspecting the DataFrame

In [7]:
df.head()

Unnamed: 0,age,workclass,education,marital.status,occupation,relationship,race,sex,hours.per.week,native.country,income
0,90,?,HS-grad,Widowed,?,Not-in-family,White,Female,40,United-States,<=50K
1,82,Private,HS-grad,Widowed,Exec-managerial,Not-in-family,White,Female,18,United-States,<=50K
2,66,?,Some-college,Widowed,?,Unmarried,Black,Female,40,United-States,<=50K
3,54,Private,7th-8th,Divorced,Machine-op-inspct,Unmarried,White,Female,40,United-States,<=50K
4,41,Private,Some-college,Separated,Prof-specialty,Own-child,White,Female,40,United-States,<=50K


In [8]:
df.dtypes

age                int64
workclass         object
education         object
marital.status    object
occupation        object
relationship      object
race              object
sex               object
hours.per.week     int64
native.country    object
income            object
dtype: object

In [9]:
# Number of rows and columns
df.shape

(32561, 11)

In [10]:
df.size

358171

In [11]:
df.columns

Index(['age', 'workclass', 'education', 'marital.status', 'occupation',
       'relationship', 'race', 'sex', 'hours.per.week', 'native.country',
       'income'],
      dtype='object')

In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   age             32561 non-null  int64 
 1   workclass       32561 non-null  object
 2   education       32561 non-null  object
 3   marital.status  32561 non-null  object
 4   occupation      32561 non-null  object
 5   relationship    32561 non-null  object
 6   race            32561 non-null  object
 7   sex             32561 non-null  object
 8   hours.per.week  32561 non-null  int64 
 9   native.country  32561 non-null  object
 10  income          32561 non-null  object
dtypes: int64(2), object(9)
memory usage: 2.7+ MB


In [13]:
df.describe()

Unnamed: 0,age,hours.per.week
count,32561.0,32561.0
mean,38.581647,40.437456
std,13.640433,12.347429
min,17.0,1.0
25%,28.0,40.0
50%,37.0,40.0
75%,48.0,45.0
max,90.0,99.0


### 2.3 Data Transformation

In [14]:
df.replace('?', 'Unknown', inplace=True)

In [15]:
df['income'] = df['income'].replace('>50K', 'Yes')
df['income'] = df['income'].replace('<=50K', 'No')
df

Unnamed: 0,age,workclass,education,marital.status,occupation,relationship,race,sex,hours.per.week,native.country,income
0,90,Unknown,HS-grad,Widowed,Unknown,Not-in-family,White,Female,40,United-States,No
1,82,Private,HS-grad,Widowed,Exec-managerial,Not-in-family,White,Female,18,United-States,No
2,66,Unknown,Some-college,Widowed,Unknown,Unmarried,Black,Female,40,United-States,No
3,54,Private,7th-8th,Divorced,Machine-op-inspct,Unmarried,White,Female,40,United-States,No
4,41,Private,Some-college,Separated,Prof-specialty,Own-child,White,Female,40,United-States,No
...,...,...,...,...,...,...,...,...,...,...,...
32556,22,Private,Some-college,Never-married,Protective-serv,Not-in-family,White,Male,40,United-States,No
32557,27,Private,Assoc-acdm,Married-civ-spouse,Tech-support,Wife,White,Female,38,United-States,No
32558,40,Private,HS-grad,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,40,United-States,Yes
32559,58,Private,HS-grad,Widowed,Adm-clerical,Unmarried,White,Female,40,United-States,No


In [16]:
def categorize_age(age):
    if age < 25:
        return 'Young'
    elif 25 <= age < 45:
        return 'Adult'
    elif 45 <= age < 65:
        return 'Middle-Aged'
    else:
        return 'Senior'
df['age_group'] = df['age'].apply(categorize_age)
df

Unnamed: 0,age,workclass,education,marital.status,occupation,relationship,race,sex,hours.per.week,native.country,income,age_group
0,90,Unknown,HS-grad,Widowed,Unknown,Not-in-family,White,Female,40,United-States,No,Senior
1,82,Private,HS-grad,Widowed,Exec-managerial,Not-in-family,White,Female,18,United-States,No,Senior
2,66,Unknown,Some-college,Widowed,Unknown,Unmarried,Black,Female,40,United-States,No,Senior
3,54,Private,7th-8th,Divorced,Machine-op-inspct,Unmarried,White,Female,40,United-States,No,Middle-Aged
4,41,Private,Some-college,Separated,Prof-specialty,Own-child,White,Female,40,United-States,No,Adult
...,...,...,...,...,...,...,...,...,...,...,...,...
32556,22,Private,Some-college,Never-married,Protective-serv,Not-in-family,White,Male,40,United-States,No,Young
32557,27,Private,Assoc-acdm,Married-civ-spouse,Tech-support,Wife,White,Female,38,United-States,No,Adult
32558,40,Private,HS-grad,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,40,United-States,Yes,Adult
32559,58,Private,HS-grad,Widowed,Adm-clerical,Unmarried,White,Female,40,United-States,No,Middle-Aged


In [17]:
df = df.drop(columns=['age'])

### 2.4 Slicing and Selecting

In [18]:
# Select first 10 rows
df.head(10)

Unnamed: 0,workclass,education,marital.status,occupation,relationship,race,sex,hours.per.week,native.country,income,age_group
0,Unknown,HS-grad,Widowed,Unknown,Not-in-family,White,Female,40,United-States,No,Senior
1,Private,HS-grad,Widowed,Exec-managerial,Not-in-family,White,Female,18,United-States,No,Senior
2,Unknown,Some-college,Widowed,Unknown,Unmarried,Black,Female,40,United-States,No,Senior
3,Private,7th-8th,Divorced,Machine-op-inspct,Unmarried,White,Female,40,United-States,No,Middle-Aged
4,Private,Some-college,Separated,Prof-specialty,Own-child,White,Female,40,United-States,No,Adult
5,Private,HS-grad,Divorced,Other-service,Unmarried,White,Female,45,United-States,No,Adult
6,Private,10th,Separated,Adm-clerical,Unmarried,White,Male,40,United-States,No,Adult
7,State-gov,Doctorate,Never-married,Prof-specialty,Other-relative,White,Female,20,United-States,Yes,Senior
8,Federal-gov,HS-grad,Divorced,Prof-specialty,Not-in-family,White,Female,40,United-States,No,Senior
9,Private,Some-college,Never-married,Craft-repair,Unmarried,White,Male,60,Unknown,Yes,Adult


In [19]:
# Select rows 20 to 30
df[20:31]

Unnamed: 0,workclass,education,marital.status,occupation,relationship,race,sex,hours.per.week,native.country,income,age_group
20,Private,Bachelors,Never-married,Exec-managerial,Not-in-family,White,Male,40,United-States,Yes,Adult
21,Private,11th,Separated,Sales,Not-in-family,White,Female,42,United-States,No,Adult
22,Private,HS-grad,Divorced,Sales,Unmarried,White,Female,25,United-States,No,Middle-Aged
23,Private,Some-college,Married-civ-spouse,Transport-moving,Husband,White,Male,40,United-States,No,Middle-Aged
24,Unknown,HS-grad,Married-civ-spouse,Unknown,Husband,White,Male,32,United-States,No,Middle-Aged
25,Private,Assoc-voc,Married-civ-spouse,Craft-repair,Husband,White,Male,40,United-States,No,Young
26,Private,1st-4th,Married-civ-spouse,Craft-repair,Not-in-family,White,Male,32,Mexico,No,Adult
27,Private,5th-6th,Married-civ-spouse,Other-service,Husband,White,Male,40,Greece,No,Middle-Aged
28,Self-emp-inc,10th,Never-married,Transport-moving,Not-in-family,White,Male,50,United-States,Yes,Adult
29,Private,10th,Never-married,Prof-specialty,Not-in-family,White,Male,90,United-States,Yes,Adult


In [20]:
df.loc[0:5, ['education', 'income', 'occupation']]

Unnamed: 0,education,income,occupation
0,HS-grad,No,Unknown
1,HS-grad,No,Exec-managerial
2,Some-college,No,Unknown
3,7th-8th,No,Machine-op-inspct
4,Some-college,No,Prof-specialty
5,HS-grad,No,Other-service


In [21]:
df.iloc[0:5, 2:6]

Unnamed: 0,marital.status,occupation,relationship,race
0,Widowed,Unknown,Not-in-family,White
1,Widowed,Exec-managerial,Not-in-family,White
2,Widowed,Unknown,Unmarried,Black
3,Divorced,Machine-op-inspct,Unmarried,White
4,Separated,Prof-specialty,Own-child,White
