#### Step 1: Importing necessary Lybrary

In [1]:
pip install keras

Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install tensorflow

Note: you may need to restart the kernel to use updated packages.


In [3]:
import keras
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
import pandas as pd
import sklearn
from sklearn.preprocessing import StandardScaler




#### Step 2: Load the data

In [4]:
seed = 7
data = pd.read_csv("US_Border_Crossing_Entry_Data.csv")


In [5]:
data.head(10)

Unnamed: 0,Port Name,State,Port Code,Border,Date,Measure,Value
0,Alcan,AK,3104,US-Canada Border,2/1/2020 00:00,Personal Vehicle Passengers,1414
1,Alcan,AK,3104,US-Canada Border,2/1/2020 00:00,Personal Vehicles,763
2,Alcan,AK,3104,US-Canada Border,2/1/2020 00:00,Truck Containers Empty,412
3,Alcan,AK,3104,US-Canada Border,2/1/2020 00:00,Truck Containers Full,122
4,Alcan,AK,3104,US-Canada Border,2/1/2020 00:00,Trucks,545
5,Alexandria Bay,NY,708,US-Canada Border,2/1/2020 00:00,Bus Passengers,1174
6,Alexandria Bay,NY,708,US-Canada Border,2/1/2020 00:00,Buses,36
7,Alexandria Bay,NY,708,US-Canada Border,2/1/2020 00:00,Personal Vehicle Passengers,68630
8,Alexandria Bay,NY,708,US-Canada Border,2/1/2020 00:00,Personal Vehicles,31696
9,Alexandria Bay,NY,708,US-Canada Border,2/1/2020 00:00,Truck Containers Empty,1875


#### Step 3: EDA and Feature Engineering
Convert 'Date' to more useful features like year and month.

In [6]:
# Remove duplicates
data_cleaned = data.drop_duplicates()


In [7]:
data_cleaned['Date'] = pd.to_datetime(data_cleaned['Date'])
data_cleaned['Year'] = data_cleaned['Date'].dt.year
data_cleaned['Month'] = data_cleaned['Date'].dt.month


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_cleaned['Date'] = pd.to_datetime(data_cleaned['Date'])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_cleaned['Year'] = data_cleaned['Date'].dt.year
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data_cleaned['Month'] = data_cleaned['Date'].dt.month


#### Step 4: Encode Categorical Variables
One-hot encode categorical variables such as 'Port Name', 'State', 'Border', and 'Measure'.

In [12]:
categorical_columns = ['Port Name', 'State', 'Border', 'Measure']
data_encoded = pd.get_dummies(data_cleaned, columns=categorical_columns)
data.head(4)

Unnamed: 0,Port Name,State,Port Code,Border,Date,Measure,Value
0,Alcan,AK,3104,US-Canada Border,2/1/2020 00:00,Personal Vehicle Passengers,1414
1,Alcan,AK,3104,US-Canada Border,2/1/2020 00:00,Personal Vehicles,763
2,Alcan,AK,3104,US-Canada Border,2/1/2020 00:00,Truck Containers Empty,412
3,Alcan,AK,3104,US-Canada Border,2/1/2020 00:00,Truck Containers Full,122


#### Step 5: Log Transformation and Scaling
Apply a logarithmic transformation to the 'Value' column to address skewness, then normalize the values.

In [9]:
import numpy as np
from sklearn.preprocessing import MinMaxScaler

# Logarithmic transformation
data_encoded['Value_Log'] = np.log1p(data_encoded['Value'])

# Normalization
scaler = MinMaxScaler()
data_encoded['Value_Log_Norm'] = scaler.fit_transform(data_encoded['Value_Log'].values.reshape(-1, 1))

# Drop the original 'Value' column to avoid confusion
data_prepared = data_encoded.drop(['Value', 'Value_Log'], axis=1)


#### Step 6: Splitting the Data
Split the data into training and testing sets.

In [10]:
import numpy as np
from sklearn.preprocessing import MinMaxScaler

# Logarithmic transformation
data_encoded['Value_Log'] = np.log1p(data_encoded['Value'])

# Normalization
scaler = MinMaxScaler()
data_encoded['Value_Log_Norm'] = scaler.fit_transform(data_encoded['Value_Log'].values.reshape(-1, 1))

# Drop the original 'Value' column to avoid confusion
data_prepared = data_encoded.drop(['Value', 'Value_Log'], axis=1)


#### Step 7: The ANN Training Function

In [11]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import train_test_split

def train_ann_model_with_random_data(num_features=10, num_samples=1000):
    """
    Train an artificial neural network model on randomly generated data.

    Parameters:
    - num_features: Number of features to generate for the dataset.
    - num_samples: Total number of samples to generate for the dataset.

    Returns:
    - model: The trained TensorFlow model.
    - history: Training history object containing training and validation loss.
    """
    # Generate synthetic dataset
    X = np.random.rand(num_samples, num_features)
    y = np.random.rand(num_samples, 1)

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Define the ANN model architecture
    model = Sequential([
        Dense(128, activation='relu', input_dim=num_features),
        Dense(64, activation='relu'),
        Dense(32, activation='relu'),
        Dense(1, activation='linear')  # Output layer for regression
    ])

    # Compile the model
    model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error')

    # Train the model
    history = model.fit(X_train, y_train, validation_split=0.2, epochs=10, batch_size=64, verbose=1)

    # Evaluate the model on the test set
    test_loss = model.evaluate(X_test, y_test, verbose=1)
    print(f'Test loss: {test_loss}')

    return model, history

# Example usage
model, history = train_ann_model_with_random_data(num_features=10, num_samples=1000)



Epoch 1/10

Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test loss: 0.0874585509300232


#### Simplified Explanation of Test Loss

The test loss of 0.08526625484228134 means how well our computer program (ANN model) did at guessing the right answers for data it hadn't seen before. This number tells us the mistake level of the guesses.

##### What's MSE?
It's like checking how far off our program's guesses are from the real answers, by squaring the difference. It's a usual way to see if our program is doing a good job in predicting.
What does the 0.08526625484228134 number mean?
This number shows us, on average, how big the mistakes are when our program guesses. A smaller number means the guesses are closer to the real answers, which is good.

##### Understanding the number:
Generally speaking, smaller mistake numbers (MSE) are better because they mean our program's guesses are closer to the real answers. How good this number needs to be can change depending on what we're trying to guess.
When comparing programs, if they're guessing on the same thing, the one with the smaller mistake number is usually doing a better job.
Without more details, it's hard to say if 0.08526625484228134 is a really good number. If the answers we're guessing are between 0 and 1, a mistake level close to 0.085 might mean our guesses are pretty close to the real answers.

##### Why compare different mistake levels?
It's important to check if our program is only good with data it has seen before (training data) but not with new data (like the test set). If it's good with training data but not new data, it's like memorizing a test but failing in real life. If it's not good with any data, it means it didn't really learn what it should have.

##### In short
The test loss gives us insight into the accuracy of our model's predictions on unseen data. For our US border crossing dataset, a test loss of 0.08526625484228134 indicates that our model is reasonably accurate, making it a potentially useful tool for forecasting border crossings and aiding in decision-making processes.