## From data loading to running a forward pass
In this exercise, you'll create a PyTorch DataLoader from a pandas DataFrame and call a model on this dataset. Specifically, you'll run a forward pass on a neural network. You'll continue working with fully connected neural networks, as you have done so far.

You'll begin by subsetting a loaded DataFrame called dataframe, converting features and targets NumPy arrays, and converting to PyTorch tensors in order to create a PyTorch dataset.

This dataset can be loaded into a PyTorch DataLoader, batched, shuffled, and used to run a forward pass on a custom fully connected neural network.

NumPy as np, pandas as pd, torch, TensorDataset(), and DataLoader() have been imported for you.

In [1]:
""" 
The Dataframe in DataCamp looks like this:
ph  Hardness  Solids  Chloramines  Sulfate  Conductivity  Organic_carbon  Trihalomethanes  Turbidity  Potability
0     0.587     0.578   0.386        0.568    0.647         0.293           0.655            0.795      0.630           0
1     0.644     0.441   0.314        0.439    0.515         0.357           0.377            0.203      0.520           0
2     0.389     0.471   0.506        0.524    0.562         0.143           0.250            0.401      0.220           0
3     0.726     0.716   0.506        0.522    0.752         0.149           0.467            0.659      0.242           0
4     0.611     0.533   0.238        0.270    0.495         0.495           0.410            0.470      0.585           0
...     ...       ...     ...          ...      ...           ...             ...              ...        ...         ...
2006  0.636     0.581   0.278        0.418    0.522         0.342           0.310            0.403      0.627           1
2007  0.470     0.549   0.301        0.538    0.499         0.231           0.565            0.176      0.395           1
2008  0.818     0.087   0.656        0.671    0.369         0.432           0.563            0.286      0.579           1
2009  0.424     0.464   0.460        0.542    0.616         0.388           0.398            0.449      0.440           1
2010  0.322     0.493   0.841        0.492    0.656         0.589           0.471            0.503      0.592           1

[2011 rows x 10 columns]"""

' \nThe Dataframe in DataCamp looks like this:\nph  Hardness  Solids  Chloramines  Sulfate  Conductivity  Organic_carbon  Trihalomethanes  Turbidity  Potability\n0     0.587     0.578   0.386        0.568    0.647         0.293           0.655            0.795      0.630           0\n1     0.644     0.441   0.314        0.439    0.515         0.357           0.377            0.203      0.520           0\n2     0.389     0.471   0.506        0.524    0.562         0.143           0.250            0.401      0.220           0\n3     0.726     0.716   0.506        0.522    0.752         0.149           0.467            0.659      0.242           0\n4     0.611     0.533   0.238        0.270    0.495         0.495           0.410            0.470      0.585           0\n...     ...       ...     ...          ...      ...           ...             ...              ...        ...         ...\n2006  0.636     0.581   0.278        0.418    0.522         0.342           0.310            0.403  

In [3]:
"""Here, I tried to write my own version:"""
import pandas as pd

# Provided data
data = {
    'ph': [0.587, 0.644, 0.389, 0.726, 0.611],
    'Hardness': [0.578, 0.441, 0.471, 0.716, 0.533],
    'Solids': [0.386, 0.314, 0.506, 0.506, 0.238],
    'Chloramines': [0.568, 0.439, 0.524, 0.522, 0.270],
    'Sulfate': [0.647, 0.515, 0.562, 0.752, 0.495],
    'Conductivity': [0.293, 0.357, 0.143, 0.149, 0.495],
    'Organic_carbon': [0.655, 0.377, 0.250, 0.467, 0.410],
    'Trihalomethanes': [0.795, 0.203, 0.401, 0.659, 0.470],
    'Turbidity': [0.630, 0.520, 0.220, 0.242, 0.585],
    'Potability': [0, 0, 0, 0, 0]
}

"""Ofcourse, you could also use a random number generator 
all the way upto 2000 using 
np.random.rand(2000).tolist() for features
and
np.random.randint(0,2,2000).tolist() for probabilities i.e., labels"""

# Convert to Pandas DataFrame
dataframe = pd.DataFrame(data)

# Display the DataFrame
print(dataframe)


      ph  Hardness  Solids  Chloramines  Sulfate  Conductivity  \
0  0.587     0.578   0.386        0.568    0.647         0.293   
1  0.644     0.441   0.314        0.439    0.515         0.357   
2  0.389     0.471   0.506        0.524    0.562         0.143   
3  0.726     0.716   0.506        0.522    0.752         0.149   
4  0.611     0.533   0.238        0.270    0.495         0.495   

   Organic_carbon  Trihalomethanes  Turbidity  Potability  
0           0.655            0.795      0.630           0  
1           0.377            0.203      0.520           0  
2           0.250            0.401      0.220           0  
3           0.467            0.659      0.242           0  
4           0.410            0.470      0.585           0  


In [2]:
import torch
import numpy as np
import pandas as pd
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader

- Extract the features (ph, Sulfate, Conductivity, Organic_carbon) and target (Potability) values and load them into the appropriate tensors to represent features and targets.
- Use both tensors to create a PyTorch dataset using the dataset class that's quickest to use when tensors don't require any additional preprocessing.

In [4]:
# Load the different columns into two PyTorch tensors
features = torch.tensor(np.array(dataframe[['ph','Sulfate','Conductivity','Organic_carbon']])).float()
target = torch.tensor(np.array(dataframe['Potability'])).float()

# Create a dataset from the two generated tensors
dataset = TensorDataset(features, target)


- Create a PyTorch DataLoader from the created TensorDataset; this DataLoader should use a batch_size of two and shuffle the dataset.

In [5]:
# Create a dataloader using the above dataset
dataloader = DataLoader(dataset, batch_size=2, shuffle=True)
x, y = next(iter(dataloader))

- Implement a small, fully connected neural network using exactly two linear layers and the nn.Sequential() API, where the final output size is 1.

In [6]:
# Create a model using the nn.Sequential API
model = nn.Sequential(
                      nn.Linear(4,2),
                      nn.Linear(2,1))
output = model(features)
print(output)

tensor([[0.4549],
        [0.3141],
        [0.4358],
        [0.4166],
        [0.3012]], grad_fn=<AddmmBackward0>)
