<a href="https://colab.research.google.com/github/carlos-alves-one/-AI-Coursework-1/blob/main/neuro_credit_report.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Goldsmiths University of London
### MSc. Data Science and Artificial Intelligence
### Module: Artificial Intelligence
### Author: Carlos Manuel De Oliveira Alves
### Student: cdeol003
### Coursework No.1

#Project
NeuroCredit: Leveraging ANN for Credit Risk Assessment

# Introduction
In the previous academic year, I undertook the development of an extensive dataset as a component of my research project for the purpose of obtaining a degree in Computer Science. This dataset encompasses a wide range of factors relevant to credit analysis and integrates them into a cohesive framework. Titled "Neuro Credit Analysis," the dataset served as a fundamental resource for investigating the complex realm of credit rating and financial behaviour through computer approaches. The objective of this dataset is to utilise advanced computational methods to decipher the patterns and determinants associated with credit activity. This research aims to investigate the impact of various characteristics on an individual's creditworthiness and the probability of loan approval by utilising machine learning, data analysis, and predictive modelling techniques. The process of constructing this dataset was conducted with great attention to detail, encompassing diligent gathering, thorough cleaning, and precise data organisation. The utilisation of this tool enabled a comprehensive examination for my scholarly endeavour and made a substantial contribution to the study of financial data.

# Methodology

## Framing the Problem in Neuro Credit Analysis
In the fields of data science and machine learning, the process of defining the problem at hand is of utmost importance as it establishes the foundation for all subsequent analysis and modelling endeavours. The process entails the establishment of clear objectives, comprehensive comprehension of the contextual factors, and the identification of appropriate analytical methodologies or machine learning algorithms. In the context of the Neuro Credit Analysis dataset, the problem might be formulated as follows:
- Defining the Objective: The main objective of the Neuro Credit Analysis dataset is to comprehensively comprehend and forecast the various aspects that exert an influence on creditworthiness and the determinations made regarding loan approval. This entails identifying attributes that substantially impact an individual's risk profile and the probability of loan approval or denial.
- Understanding the Context: The present context pertains to the domain of credit analysis within finance. This necessitates a comprehensive comprehension of the financial realm and the intricate interplay between diverse criteria such as credit history, income, employment status, and other relevant variables that impact credit decisions. Comprehending the ramifications of these aspects is essential for conducting a thorough analysis.
- Determining the Analytical Approach:
  - Exploratory Data Analysis (EDA): Conducting an exploratory investigation is a crucial preliminary step before engaging with intricate models. This encompasses the process of visualising distributions of diverse features, comprehending correlations, and detecting patterns or anomalies within the data.
  - Statistical Analysis: The process of loan approval may necessitate the utilisation of hypothesis testing in order to ascertain the significance of different criteria.
  - Predictive Modeling: The problem can be approached as a classification task to predict whether a loan will be approved or rejected based on the given attributes.
- Choosing the Right Tools and Techniques:
  - For EDA, tools like pandas, Matplotlib, and Seaborn in Python are ideal.
  - For statistical analysis, techniques like chi-square tests for categorical variables and t-tests or ANOVA for numerical variables might be relevant.
  - Predictive modelling can utilise machine learning techniques such as logistic regression, decision trees, random forests, or gradient boosting machines. In this project, we will implement a primary artificial neural network limited to sequential dense and dropout layers.
- Evaluating the Model:Determining the right metrics for model evaluation is critical. In a credit approval context, metrics like accuracy, precision, recall, F1-score, and the ROC-AUC curve are essential to evaluate the model's performance.
- Understanding Ethical Implications: Credit analysis models must be fair and unbiased. It is essential to ensure that the model does not inadvertently discriminate against any group of individuals based on sensitive attributes like race, gender, or age.
- Real-World Viability: The model should not only be accurate but also practical and interpretable. This means it should be capable of being deployed in a real-world financial setting and provide understandable reasons for its decisions.

Framing the problem in this structured manner ensures a focused and practical approach to analyzing the Neuro Credit Analysis dataset. It guides the selection of methodologies and techniques and lays the foundation for generating insights that are both meaningful and actionable in the realm of credit analysis.

##Preparing the Data
Data preparation is crucial in data science, especially when dealing with complex datasets like the Neuro Credit Analysis. This step involves transforming the raw data into a suitable format for analysis or modelling. Here is how the data can be prepared:

### Load the data

In [1]:
# Imports the 'drive' module from 'google.colab' and mounts the Google Drive to
# the '/content/drive' directory in the Colab environment.
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
# Import the pandas library and give it the alias 'pd' for data manipulation and analysis
import pandas as pd

# Load the dataset glaucoma from Google Drive
data_path = '/content/drive/MyDrive/neuro_credit.csv'
neuro_credit_data = pd.read_csv(data_path)

# Display the first few rows of the dataframe
neuro_credit_data.head()

Unnamed: 0,credit_history,employment_status,collateral,payment_history,type_of_credit_accounts,public_records_and_collections,purpose_of_loan,income,assets_value,debt_to_income_ratio,length_of_credit_history,number_of_credit_inquiries,number_of_credit_accounts,number_of_credit_accounts_opened_last_12_months,current_balance_of_credit_accounts,total_credit_limit,total_credit_utilization,loan_amount,saving_account_balance,approval_status
0,Fair,Unemployed,Other,Excellent,Student,Other,Home Improvement,46319,14680,41,24,5,3,4,9591,17977,61,4315,10207,Rejected
1,Good,Self-Employed,,Poor,Auto,Tax Lien,Debt Consolidation,15480,46713,82,0,5,3,2,8727,2399,60,4263,16666,Rejected
2,Fair,Self-Employed,House,Fair,Personal,Bankruptcy,Home Improvement,21614,13026,68,99,1,4,2,13997,10655,90,2334,10413,Rejected
3,Good,Unemployed,Other,Poor,Mortgage,Other,Car Financing,25874,27908,34,55,4,4,2,15684,19176,54,4010,16645,Rejected
4,Fair,Unemployed,Land,Excellent,Personal,,Car Financing,20389,44309,75,30,4,0,5,1197,4449,100,2977,16366,Rejected
