# Loss Curves for Neural Networks
**Objectives**
- Analyze and interpret training and testing loss curves for tabular data.
- Identify common training issues such as overfitting, underfitting, and instability.
- Experiment with model hyperparameters and training parameters to improve performance.


---

The **UCI Adult Income dataset** is a tabular dataset used for binary classification tasks. The goal is to predict whether an individual's income exceeds $50,000 per year based on census data.

**Details**:
- **Features:** 14 attributes (e.g., age, education, occupation).
- **Target:** Binary class (income >50K or <=50K).
- **Size:** 32,561

**Main Tasks**:
1. Load and preprocess the dataset.
2. Split it into a training and test set.
3. Train a neural network and analyze the training/test loss curves.
4. Perform experiments to understand how model capacity and learning rate impact performance.


----

## Part 1: Data Preparation

Objectives:
1. Load the dataset using `pandas`.
2. Split the data into a training and test set (say, 90%/10%)
3. Prepare data for training and inference by adequately handling missing values, categorical features, and normalizing the data.



In [1]:
import pandas as pd

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"

columns = ["age", "workclass", "fnlwgt", "education", "education-num", "marital-status",
           "occupation", "relationship", "race", "sex", "capital-gain", "capital-loss",
           "hours-per-week", "native-country", "income"]

data = pd.read_csv(url, names=columns, sep=',\s*', engine='python', na_values="NA")

In [4]:
# your code starts here
data.info()
data.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32561 entries, 0 to 32560
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   age             32561 non-null  int64 
 1   workclass       32561 non-null  object
 2   fnlwgt          32561 non-null  int64 
 3   education       32561 non-null  object
 4   education-num   32561 non-null  int64 
 5   marital-status  32561 non-null  object
 6   occupation      32561 non-null  object
 7   relationship    32561 non-null  object
 8   race            32561 non-null  object
 9   sex             32561 non-null  object
 10  capital-gain    32561 non-null  int64 
 11  capital-loss    32561 non-null  int64 
 12  hours-per-week  32561 non-null  int64 
 13  native-country  32561 non-null  object
 14  income          32561 non-null  object
dtypes: int64(6), object(9)
memory usage: 3.7+ MB


Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,income
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


In [7]:
for col in data.columns:
    print(data[col].value_counts())
    print()

age
36    898
31    888
34    886
23    877
35    876
     ... 
83      6
88      3
85      3
86      1
87      1
Name: count, Length: 73, dtype: int64

workclass
Private             22696
Self-emp-not-inc     2541
Local-gov            2093
?                    1836
State-gov            1298
Self-emp-inc         1116
Federal-gov           960
Without-pay            14
Never-worked            7
Name: count, dtype: int64

fnlwgt
123011    13
203488    13
164190    13
148995    12
113364    12
          ..
174981     1
77774      1
134069     1
44777      1
98106      1
Name: count, Length: 21648, dtype: int64

education
HS-grad         10501
Some-college     7291
Bachelors        5355
Masters          1723
Assoc-voc        1382
11th             1175
Assoc-acdm       1067
10th              933
7th-8th           646
Prof-school       576
9th               514
12th              433
Doctorate         413
5th-6th           333
1st-4th           168
Preschool          51
Name: count, dtype: in

In [None]:
workclass, occupation, native-country

---

## Part 2: Build and Train the Model

**Initial Model Architecture**:


- *Input layer:* Size matches the number of features after encoding.
- *Hidden layers:*
  - Hidden Layer 1: 8 units, ReLU activation.
  - Hidden Layer 2: 8 units, ReLU activation.
- *Output layer:* 1 unit (binary classification), Sigmoid activation.


**Loss Function and Optimizer**:
- Loss: Binary Cross-Entropy Loss (`BCELoss`).
- Optimizer: SGD with a learning rate of 0.001.

**Training**:
- Train the model for 20 epochs.
- Record (store) the training and test losses at each epoch.


In [1]:
# your code here
# hint: recall how we did this previously

---

## Part 3: Visualize and Interpret Loss Curves

**Tasks**:
1. Plot the training and test loss curves.
2. Answer the following questions:
   - Is the model underfitting, overfitting, or neither? Provide evidence.
   - If overfitting occurs, at what epoch does it begin?
   - What can you infer about the model's performance from the loss curves?

In [None]:
# your code here

---

## Part 4: Experimentation with Hyperparameters

**Experiment**: Varying model capacity
- Now train models with different hidden layer sizes (width & depth):
  - **Smaller model:** 
    - Try architectures will LESS representation capacity than we had initially
  - **Larger model:** 
    - Try architectures will MORE representation capacity than we had initially

- Train each of those architectures, with the same initial learning rate, 20 epochs each and plot their loss curves.

**Questions:**
1. How does changing the model size affect training and test loss?
2. What interesting findings have you found? Prepare to present and share with your classmates (just show the loss curves and explain your findings).

In [None]:
# your code here

---
## Part 5: Experimentation with learning rate


**Experiment**: Periodic reduction of learning rate
- During class we learned that dropping the learning rate, after a period of training, could lead to better results. Can we try to replicate that?

**Question:**
1. Did you succeed? If so, why did it work?


In [None]:
# your code here