**1. Neural Networks**

*Installing the needed libraries*

In [None]:
pip install pandas scikit-learn tensorflow

importing the libraries

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Input
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

Step 1: Data Preparation

In [None]:
## Load the dataset
data = pd.read_csv('BankRecords.csv')

## # Display the first few rows of the dataset
print(data.head())

# Display information about the dataset
print(data.info())

# Define numeric features for scaling
numeric_features = ['Age', 'Experience(Years)', 'Family', 'Credit Score', 'Mortgage(Thousands\'s)']

# Fill missing values in numeric features with the mean value of each column
data[numeric_features] = data[numeric_features].fillna(data[numeric_features].mean())

# Define categorical features for one-hot encoding
categorical_features = ['Education', 'Personal Loan', 'Securities Account', 'CD Account', 'Online Banking', 'CreditCard']

# Create a preprocessor with transformers for numeric and categorical features
preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numeric_features),
        ('cat', OneHotEncoder(), categorical_features)
    ])

# Define the feature matrix (X) and target vector (y)
X = data.drop(['ID', 'Income(Thousands\'s)'], axis=1)
y = data['Income(Thousands\'s)']

# Apply the transformations to the feature matrix
X = preprocessor.fit_transform(X)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Step 2: Model Implementation

In [None]:
# Define the Neural Network model
nn_model = Sequential([
    Input(shape=X_train.shape[1:]), 
    Dense(64, activation='relu'),   
    Dense(64, activation='relu'),    
    Dense(1)                         
])

# Compile the Neural Network model
nn_model.compile(optimizer='adam', loss='mse')

# Train the Neural Network model
nn_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2, verbose=1)

# Define and train the Linear Regression model
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)

Step 3: Model Evaluation and Comparison

In [None]:
# Predict using the Neural Network model
y_pred_nn = nn_model.predict(X_test)

# Calculate mean squared error for the Neural Network model
mse_nn = mean_squared_error(y_test, y_pred_nn)

# Calculate R-squared for the Neural Network model
r2_nn = r2_score(y_test, y_pred_nn)

# Predict using the Linear Regression model
y_pred_lr = lr_model.predict(X_test)

# Calculate mean squared error for the Linear Regression model
mse_lr = mean_squared_error(y_test, y_pred_lr)

# Calculate R-squared for the Linear Regression model
r2_lr = r2_score(y_test, y_pred_lr)

# Print the evaluation metrics for both models
print(f'Neural Network MSE: {mse_nn}, R2: {r2_nn}')
print(f'Linear Regression MSE: {mse_lr}, R2: {r2_lr}')

Step 4: Prediction on New Data

In [None]:
# Define a new customer data
new_customer = {'Age': 30, 'Experience(Years)': 5, 'Family': 3, 'Credit Score': 0.8, 'Mortgage(Thousands\'s)': 0, 
                'Education': 'Degree', 'Personal Loan': 'No', 'Securities Account': 'No', 'CD Account': 'No', 
                'Online Banking': 'Yes', 'CreditCard': 'No'}

# Convert the new customer data to a DataFrame
new_customer_df = pd.DataFrame([new_customer])

# Transform the new customer data using the preprocessor
new_customer_transformed = preprocessor.transform(new_customer_df)

# Predict the income using the Neural Network model
income_prediction_nn = nn_model.predict(new_customer_transformed)

# Predict the income using the Linear Regression model
income_prediction_lr = lr_model.predict(new_customer_transformed)

# Print the predictions for the new customer
print(f'Neural Network Prediction for new customer: {income_prediction_nn}')
print(f'Linear Regression Prediction for new customer: {income_prediction_lr}')

**Data Preparation**

Loading and Inspection

The dataset 'BankRecords.csv' is loaded using Pandas to examine its structure and contents. This step is crucial for ensuring data integrity and gaining an understanding of the variables present in the dataset.

Handling Missing Values

Missing values in numeric features such as 'Age', 'Experience(Years)', 'Family', 'Credit Score', and 'Mortgage(Thousands)' are imputed with the mean of their respective columns. This imputation strategy ensures that missing values do not adversely affect the performance of the predictive models.

Encoding Categorical Variables

Categorical variables including 'Education', 'Personal Loan', 'Securities Account', 'CD Account', 'Online Banking', and 'CreditCard' are encoded using one-hot encoding. This transformation converts categorical variables into a numerical format suitable for model training, ensuring that the models can interpret these variables effectively.

Feature Scaling

Numeric features are standardized using StandardScaler. Standardization ensures that all features have a mean of 0 and a standard deviation of 1, preventing features with larger scales from dominating the model training process. This step facilitates better convergence during model training.

Data Splitting

The dataset is split into training and testing sets using the train_test_split function. This division allows for model training on one subset and evaluation on another, enabling an unbiased assessment of model performance and generalization to unseen data.

Model Evaluation and Comparison

Neural Network Model

A neural network model with two hidden layers, each containing 64 neurons, is trained using the Adam optimizer and mean squared error (MSE) loss function. Upon evaluation on the test set, the neural network achieves an MSE of 823.83 and an R2 score of 0.6115.

Linear Regression Model

A linear regression model is trained and evaluated on the same test set, resulting in an MSE of 926.37 and an R2 score of 0.5631.

**Findings and Final Rationale**

Both models perform reasonably well in predicting the income of customers. The neural network model slightly outperforms the linear regression model in terms of MSE and R2 score, indicating its ability to capture complex non-linear relationships between features. However, further model tuning and feature engineering may enhance the performance of both models. Techniques such as adjusting neural network architecture, hyperparameter tuning, and feature selection can potentially improve model accuracy and generalization.

Prediction for New Customer

To predict the income of a new customer not available in the original dataset, their feature values are provided as input to both the trained neural network and linear regression models. The neural network predicts an income of approximately 44,358.21 for the new customer, while the linear regression model predicts an income of approximately 44,774.21. These predictions offer insights into the potential income level of the new customer based on learned patterns from the training data.

**2. Semantic Analysis**

*Installing the needed libraries*

In [None]:
pip install pandas nltk matplotlib seaborn

Importing the libraries

In [None]:
import pandas as pd
import re
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

Downloading NLTK Resources

In [None]:
nltk.download('stopwords')
nltk.download('vader_lexicon')
nltk.download('punkt')

Step 1: Data loading and inspection

In [None]:
# Load the dataset #

data = pd.read_csv('all_annotated.csv', encoding='latin1')

# Display the first few rows of the dataset #
print(data.head())

Step 2: Data cleaning and preparation

In [None]:
# Filter the dataset to include only English tweets #

data_english = data[data['Definitely English'] == 1].copy()

Step 3: Defining text pre-processing function

In [None]:
# Define stop words set from NLTK's English stopwords #

stop_words = set(stopwords.words('english'))

# Define a text preprocessing function #

def preprocess_text(text):
    text = re.sub(r'<[^>]+>', '', text) 
    text = re.sub(r'\W', ' ', text) 
    text = re.sub(r'\s+', ' ', text) 
    text = text.lower()
    tokens = word_tokenize(text)
    tokens = [word for word in tokens if word not in stop_words]
    return ' '.join(tokens)

Step 4: 