# Name, etc

# Introduction

This CA consists of 2 parts. Neural Networks and Semantic analysis.

### Neural Networks

You are required to take data stored file “BankRecords.csv”, available on Moodle and process the data into a DataFrame. 
You are then required to train a Neural Network to predict the “Income(Thousands’)” of the customers, including tuning the network to achieve the best results.
You must also compare your neural network to a standard ML regressor of your choosing and discuss your findings in the context of the problem at hand.

### Semantic Analysis

You are required to source text data from any social media platform on any topic that you choose and perform semantic analysis on the text. This analysis should provide a visualization of  the overall sentiment of your text data , showing the positive, neutral, and negative sentiment expressed. You will require at least 1000 text observations.

In [4]:
pip install matplotlib

Collecting matplotlib
  Downloading matplotlib-3.9.0-cp39-cp39-win_amd64.whl.metadata (11 kB)
Collecting contourpy>=1.0.1 (from matplotlib)
  Downloading contourpy-1.2.1-cp39-cp39-win_amd64.whl.metadata (5.8 kB)
Collecting cycler>=0.10 (from matplotlib)
  Downloading cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)
Collecting fonttools>=4.22.0 (from matplotlib)
  Downloading fonttools-4.52.4-cp39-cp39-win_amd64.whl.metadata (164 kB)
     ---------------------------------------- 0.0/165.0 kB ? eta -:--:--
     -------------------------------------  163.8/165.0 kB 5.0 MB/s eta 0:00:01
     -------------------------------------- 165.0/165.0 kB 1.7 MB/s eta 0:00:00
Collecting kiwisolver>=1.3.1 (from matplotlib)
  Downloading kiwisolver-1.4.5-cp39-cp39-win_amd64.whl.metadata (6.5 kB)
Collecting pillow>=8 (from matplotlib)
  Downloading pillow-10.3.0-cp39-cp39-win_amd64.whl.metadata (9.4 kB)
Collecting pyparsing>=2.3.1 (from matplotlib)
  Downloading pyparsing-3.1.2-py3-none-any.whl.metadata

In [3]:
# Import libraries needed

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
import tensorflow as tf
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from numpy import loadtxt
from keras.models import Sequential
from keras.layers import Dense

ModuleNotFoundError: No module named 'matplotlib'

In [None]:
data = pd.read_csv("C:/Users/henri/Documents/GitHub/CA-Machine-Learning/BankRecords.csv")
data.head()

In [None]:
# Check for missing values
missing_values = data.isnull().sum()
print(missing_values)

# Check for duplicates
duplicates = data.duplicated().sum()
print(f"Number of duplicate rows: {duplicates}")

In [None]:
# Create correlation matrix
matrix = data.corr()
print(matrix)

In [None]:
# Encode categorical variables
label_encoders = {}
categorical_features = ['Education', 'Personal Loan', 'Securities Account', 'CD Account', 'Online Banking', 'CreditCard']
for col in categorical_features:
    le = LabelEncoder()
    data[col] = le.fit_transform(data[col])
    label_encoders[col] = le

# Standardize numerical features
scaler = StandardScaler()
numerical_features = ["Age", "Experience(Years)" , "Credit Score", "Mortgage(Thousands's)"]
data[numerical_features] = scaler.fit_transform(data[numerical_features])

In [None]:
# Split the data into training and testing sets
X = data.drop(["ID", "Income(Thousands's)", "Sort Code"], axis=1)
y = data["Income(Thousands's)"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Linear Regression

In [None]:
# Linear Regression

# Initialize the model
linear_model = LinearRegression()

# Train the model
linear_model.fit(X_train, y_train)

# Predictions
y_pred = linear_model.predict(X_test)

# Evaluation
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Linear Regression MSE: {mse}")
print(f"Linear Regression R2 Score: {r2}")

In [None]:
plt.figure(figsize=(10, 6))
plt.plot(range(len(y_test)), y_test, color='blue', label='Actual Income')
plt.plot(range(len(y_test)), y_pred, color='red', linestyle='--', label='Predicted Income')
plt.xlabel('')
plt.ylabel('Income (Thousands)')
plt.title('Actual and Predicted Income')
plt.legend()
plt.show()

# Artificial Neural Network

# References

https://www.geeksforgeeks.org/create-a-correlation-matrix-using-python/ (28/05)
https://www.analyticsvidhya.com/blog/2021/10/implementing-artificial-neural-networkclassification-in-python-from-scratch/ (29/05)