# **Pandas Fundamentals**

In [2]:
import pandas as pd
import numpy as np

## **1. Pandas Series**
A **Series** in pandas is a **one-dimensional** array-like object that can hold various types of data, such as integers, floats, and strings.

### **Key Features of Series:**
- Represents a **single column** of data, similar to an Excel column.
- Contains **index** (labels) and **values** (data).
- Can be created from **lists, NumPy arrays, or dictionaries**.
- Maintains **data consistency** (all elements should ideally be of the same type).
- Supports **powerful built-in methods** for data manipulation.

### **Creating a Pandas Series**

In [3]:
products = ['a', 'b', 'c', 'd']
products_series = pd.Series(products)
print(products_series)

0    a
1    b
2    c
3    d
dtype: object


- The left column represents the **index**.
- The **dtype** (data type) is inferred as `object` for non-numeric data.

In [4]:
daily_rates_dollars = pd.Series([40, 45, 50, 60])
print(daily_rates_dollars)

0    40
1    45
2    50
3    60
dtype: int64


### **Creating a Series from a NumPy Array**

In [5]:
array_a = np.array([10, 20, 30, 40, 50])
series_a = pd.Series(array_a)
print(series_a)

0    10
1    20
2    30
3    40
4    50
dtype: int32


## **2. Working with Methods in Pandas**
### **Attributes vs. Methods**
| **Attributes (Passive)** | **Methods (Active)** |
|-------------------------|---------------------|
| Store metadata about an object | Perform actions on the object |
| Do not require parentheses | Require parentheses when called |
| Example: `.shape` (returns dimensions) | Example: `.head()` (returns first rows) |

### **Example: Using Methods on a Series**

In [6]:
start_dates_deposits = pd.Series({
    '7/4/2014': 2000,
    '7/4/2015': 2000,
    '7/4/2016': 2200,
    '7/4/2017': 4000,
    '7/4/2019': 2000
})

print(start_dates_deposits.sum())  # Sum of deposits
print(start_dates_deposits.min())  # Minimum value
print(start_dates_deposits.max())  # Maximum value
print(start_dates_deposits.idxmax())  # Index of max value
print(start_dates_deposits.idxmin())  # Index of min value
print(start_dates_deposits.head(2))  # First 2 rows
print(start_dates_deposits.tail(2))  # Last 2 rows

12200
2000
4000
7/4/2017
7/4/2014
7/4/2014    2000
7/4/2015    2000
dtype: int64
7/4/2017    4000
7/4/2019    2000
dtype: int64


## **3. Parameters and Arguments in Pandas**
- Pandas methods often come with **parameters** to modify how they operate.
- A **parameter** is a named variable in a function, and an **argument** is the value passed to it.
- Example:

In [7]:
start_dates_deposits.head(n=2)  # Shows first 2 rows (default is 5)

7/4/2014    2000
7/4/2015    2000
dtype: int64

## **4. Using `.unique()` and `.nunique()`**
| Method | Purpose |
|--------|---------|
| `.unique()` | Returns all unique values in a Series |
| `.nunique()` | Returns the number of unique values |

Example:

In [8]:
data = pd.read_csv(r"C:\Users\user\Desktop\studying\1- python\dummy data\Location.csv")
location_data = data['Location']

print(location_data.nunique())  # Number of unique values
print(location_data.unique())  # Array of unique values

296
['Location 3' 'Location 6' 'Location 8' 'Location 26' 'Location 34'
 'Location 25' 'Location 46' 'Location 156' 'Location 21' 'Location 13'
 'Location 579' 'Location 602' 'Location 10' 'Location 44' 'Location 30'
 'Location 48' 'Location 196' 'Location 64' 'Location 91' 'Location 62'
 'Location 75' 'Location 42' 'Location 233' 'Location 95' 'Location 78'
 'Location 61' 'Location 87' 'Location 19' 'Location 115' 'Location 350'
 'Location 377' 'Location 17' 'Location 113' 'Location 81' 'Location 58'
 'Location 212' 'Location 53' 'Location 337' 'Location 41' 'Location 632'
 'Location 73' 'Location 214' 'Location 218' 'Location 38' 'Location 172'
 'Location 197' 'Location 101' 'Location 185' 'Location 129'
 'Location 235' 'Location 142' 'Location 50' 'Location 76' 'Location 11'
 'Location 33' 'Location 22' 'Location 145' 'Location 203' 'Location 94'
 'Location 573' 'Location 27' 'Location 186' 'Location 4' 'Location 70'
 'Location 45' 'Location 262' 'Location 111' 'Location 84' 'Locati

In [9]:
location_data.describe()

count            1043
unique            296
top       Location 25
freq               31
Name: Location, dtype: object

- The **top** in `.describe()` represents the most frequent value.
- **freq** shows how often the top value appears.

## **5. Sorting Data with `.sort_values()`**
- **Ascending Sort (Default)**:


In [10]:
numbers = pd.Series([15, 1000, 23, 45, 444])
print(numbers.sort_values())  # Ascending order

0      15
2      23
3      45
4     444
1    1000
dtype: int64


- **Descending Sort**:


In [11]:
print(numbers.sort_values(ascending=False))  # Descending order

1    1000
4     444
3      45
2      23
0      15
dtype: int64


## **6. Pandas DataFrame**
A **DataFrame** is a **two-dimensional** table with rows and columns, similar to an Excel sheet.

### **Key Features:**
- Each **column** in a DataFrame is a **Series**.
- Supports multiple data types within different columns.
- Has **index** (row labels) and **column names**.

### **Creating a DataFrame**

In [12]:
array_a = np.array([[3, 2, 1], [6, 3, 2]])
df = pd.DataFrame(array_a, columns=['Column 1', 'Column 2', 'Column 3'], index=['Row 1', 'Row 2'])
print(df)

       Column 1  Column 2  Column 3
Row 1         3         2         1
Row 2         6         3         2


## **7. Common Attributes for DataFrames**

In [13]:
data = pd.read_csv(r"C:\Users\user\Desktop\studying\1- python\dummy data\Lending-company.csv", index_col='LoanID')
lending_data = data.copy()

print(lending_data.index)  # Get index
print(lending_data.columns)  # Get column names
print(lending_data.axes)  # Get both index and column names
print(lending_data.dtypes)  # Check data types
print(lending_data.shape)  # Get dimensions (rows, columns)
print(lending_data.values)  # Convert to NumPy array

Index([   1,    2,    3,    4,    5,    6,    7,    8,    9,   10,
       ...
       1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1043],
      dtype='int64', name='LoanID', length=1043)
Index(['StringID', 'Product', 'CustomerGender', 'Location', 'Region',
       'TotalPrice', 'StartDate', 'Deposit', 'DailyRate', 'TotalDaysYr',
       'AmtPaid36', 'AmtPaid60', 'AmtPaid360', 'LoanStatus'],
      dtype='object')
[Index([   1,    2,    3,    4,    5,    6,    7,    8,    9,   10,
       ...
       1034, 1035, 1036, 1037, 1038, 1039, 1040, 1041, 1042, 1043],
      dtype='int64', name='LoanID', length=1043), Index(['StringID', 'Product', 'CustomerGender', 'Location', 'Region',
       'TotalPrice', 'StartDate', 'Deposit', 'DailyRate', 'TotalDaysYr',
       'AmtPaid36', 'AmtPaid60', 'AmtPaid360', 'LoanStatus'],
      dtype='object')]
StringID           object
Product            object
CustomerGender     object
Location           object
Region             object
TotalPrice        float

## **8. Data Selection in Pandas DataFrames**
### **Selecting Columns**

In [15]:
print(lending_data['Location'])  # Select a column
print(lending_data[['Location', 'Product','StringID']])  # Select multiple columns

LoanID
1        Location 3
2        Location 6
3        Location 8
4       Location 26
5       Location 34
           ...     
1039    Location 73
1040    Location 82
1041    Location 11
1042    Location 26
1043    Location 94
Name: Location, Length: 1043, dtype: object
           Location    Product     StringID
LoanID                                     
1        Location 3  Product B     LoanID_1
2        Location 6  Product D     LoanID_2
3        Location 8  Product B     LoanID_3
4       Location 26  Product A     LoanID_4
5       Location 34  Product B     LoanID_5
...             ...        ...          ...
1039    Location 73  Product B  LoanID_1039
1040    Location 82  Product A  LoanID_1040
1041    Location 11  Product A  LoanID_1041
1042    Location 26  Product B  LoanID_1042
1043    Location 94  Product A  LoanID_1043

[1043 rows x 3 columns]


### **Using `.iloc[]` (Integer Location Indexing)**
- **Position-based indexing** (row & column numbers).


In [16]:
print(lending_data.iloc[1, 3])  # Second row, fourth column
print(lending_data.iloc[0:3, 1:4])  # Rows 0-2, Columns 1-3
print(lending_data.iloc[:, 2])  # All rows, third column
print(lending_data.iloc[[1, 3], [1, 3]])  # Rows 2 & 4, Columns 2 & 4

Location 6
          Product CustomerGender    Location
LoanID                                      
1       Product B         Female  Location 3
2       Product D         Female  Location 6
3       Product B           Male  Location 8
LoanID
1             Female
2             Female
3               Male
4               Male
5             Female
            ...     
1039            Male
1040            Male
1041    NotSpecified
1042          Female
1043    NotSpecified
Name: CustomerGender, Length: 1043, dtype: object
          Product     Location
LoanID                        
2       Product D   Location 6
4       Product A  Location 26


## **9. Using `.loc[]` for Label-Based Indexing**
- **Explicit label-based indexing** (row & column labels).

In [17]:
data = pd.read_csv(r"C:\Users\user\Desktop\studying\1- python\dummy data\Lending-company.csv", index_col='StringID')

print(data.loc['LoanID_1'])  # Select row by index label
print(data.loc[:, 'Product'])  # Select column by label
print(data.loc[['LoanID_1', 'LoanID_3'], ['Product', 'Location']])  # Select specific rows & columns

LoanID                     1
Product            Product B
CustomerGender        Female
Location          Location 3
Region              Region 2
TotalPrice           17600.0
StartDate         04/07/2018
Deposit                 2200
DailyRate                 45
TotalDaysYr              365
AmtPaid36               3221
AmtPaid60               4166
AmtPaid360             14621
LoanStatus            Active
Name: LoanID_1, dtype: object
StringID
LoanID_1       Product B
LoanID_2       Product D
LoanID_3       Product B
LoanID_4       Product A
LoanID_5       Product B
                 ...    
LoanID_1039    Product B
LoanID_1040    Product A
LoanID_1041    Product A
LoanID_1042    Product B
LoanID_1043    Product A
Name: Product, Length: 1043, dtype: object
            Product    Location
StringID                       
LoanID_1  Product B  Location 3
LoanID_3  Product B  Location 8


## **Conclusion**
- **Series**: One-dimensional labeled array.
- **DataFrame**: Two-dimensional table with labeled columns.
- **Methods & Attributes**: Used for data operations.
- **Indexing**: `.iloc[]` (integer-based) vs `.loc[]` (label-based).

