### Table of Content
- [Creating New DataFrame](#Creating-New-DataFrame)
- [Accessing Data Element](#Accessing-Data-Element)

In [None]:
# Import pandas using alias pd
import pandas as pd 
from pathlib import Path

### Creating New DataFrame
--------------------------

A DataFrame can be created by `pd.DataFrame()` method. 

The content of a DataFrame is constructed using Python **dictionary-list**, where the **key** in the dictionary is the column name of the DataFrame, and the **values** are a list of entries.

In [3]:
# --- 1. Create a new DataFrame --- 
df = pd.DataFrame({"Year_Birth":[1999,2000],
                   "Income":[36000,38000]})

print(df)


   Year_Birth  Income
0        1999   36000
1        2000   38000


By default, the row index of a new DataFrame is acsending from 0(0,1,2,3,...). It's also possible to assign the desired row index when creating a DataFrame unsing index.

In [4]:
# --- 2. Create a new DataFrame with index ---
df_with_index = pd.DataFrame({"Year_Birth":[1999,2000],
              "Income":[36000,38000]},
               index = [1,2])

print(df_with_index)

   Year_Birth  Income
1        1999   36000
2        2000   38000


### Accessing Data Element

Accessing columns of a DataFrame using `.` or `[]` operator.

In [5]:
# Loading the dataset
csv_file_path = Path('data/corrected_marketing_campaign.csv')

if csv_file_path.exists():
    df_marketing = pd.read_csv(csv_file_path, sep = ',') 
    print(f'Data loaded successfully: {df_marketing.shape[0]} rows, {df_marketing.shape[1]} columns.')
else:
    raise FileNotFoundError('Dataset not found. Please check the path.')

Data loaded successfully: 2240 rows, 29 columns.


In [6]:
# --- 3. Accessing specific column using `.` operator ---
# For example, accessing the 'Education' column
# `.` operator is not recommended for column names with spaces or special characters.
df_marketing.Education

0       Graduation
1       Graduation
2       Graduation
3       Graduation
4              PhD
           ...    
2235    Graduation
2236           PhD
2237    Graduation
2238        Master
2239           PhD
Name: Education, Length: 2240, dtype: object

In [7]:
# --- 4. Accessing a specific column using `[]` operator ---
df_marketing['Education']

0       Graduation
1       Graduation
2       Graduation
3       Graduation
4              PhD
           ...    
2235    Graduation
2236           PhD
2237    Graduation
2238        Master
2239           PhD
Name: Education, Length: 2240, dtype: object

In [8]:
# --- 5. Accessing multiple columns using `[]` operator ---
# For example, accessing 'Education' and 'Income' columns
df_marketing[['Education', 'Income']]

Unnamed: 0,Education,Income
0,Graduation,58138.0
1,Graduation,46344.0
2,Graduation,71613.0
3,Graduation,26646.0
4,PhD,58293.0
...,...,...
2235,Graduation,61223.0
2236,PhD,64014.0
2237,Graduation,56981.0
2238,Master,69245.0


Accessing a single row in a DataFrame

In [9]:
# --- 6. Using .iloc[] for position-based access ---
df_marketing.iloc[0]  # First row
df_marketing.iloc[3]  # Fourth row 

ID                           6182
Year_Birth                   1984
Education              Graduation
Marital_Status           Together
Income                    26646.0
Kidhome                         1
Teenhome                        0
Dt_Customer            10-02-2014
Recency                        26
MntWines                       11
MntFruits                       4
MntMeatProducts                20
MntFishProducts                10
MntSweetProducts                3
MntGoldProds                    5
NumDealsPurchases               2
NumWebPurchases                 2
NumCatalogPurchases             0
NumStorePurchases               4
NumWebVisitsMonth               6
AcceptedCmp3                    0
AcceptedCmp4                    0
AcceptedCmp5                    0
AcceptedCmp1                    0
AcceptedCmp2                    0
Complain                        0
Z_CostContact                   3
Z_Revenue                      11
Response                        0
Name: 3, dtype

Accessing multiple rows

In [10]:
# --- 7. Accessing the first three rows ---
df_marketing.iloc[0:3]  

Unnamed: 0,ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,...,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Z_CostContact,Z_Revenue,Response
0,5524,1957,Graduation,Single,58138.0,0,0,04-09-2012,58,635,...,7,0,0,0,0,0,0,3,11,1
1,2174,1954,Graduation,Single,46344.0,1,1,08-03-2014,38,11,...,5,0,0,0,0,0,0,3,11,0
2,4141,1965,Graduation,Together,71613.0,0,0,21-08-2013,26,426,...,4,0,0,0,0,0,0,3,11,0


In [11]:
# --- 8. Accessing the first and third rows ---
df_marketing.iloc[[0,2]] 

Unnamed: 0,ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,...,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Z_CostContact,Z_Revenue,Response
0,5524,1957,Graduation,Single,58138.0,0,0,04-09-2012,58,635,...,7,0,0,0,0,0,0,3,11,1
2,4141,1965,Graduation,Together,71613.0,0,0,21-08-2013,26,426,...,4,0,0,0,0,0,0,3,11,0


Accessing single value of a column in DataFrame using row index

In [12]:
# --- 9. Accessing the third row (index 2) of a specific column ---
df_marketing['Year_Birth'][2]

1965

In [16]:
# --- 10. Accessing a specific column in a row ---
df_marketing.iloc[1]['Income']  

46344.0