# Machine Learning for Finance

This notebook provides an introduction to machine learning for finance. We will cover the following topics:

* **What is machine learning?**
* **Types of machine learning**
* **Applications of machine learning in finance**
* **Building a machine learning model**

## 1. What is machine learning?

Machine learning is a subfield of artificial intelligence (AI) that gives computers the ability to learn without being explicitly programmed. Machine learning algorithms are trained on data, and they use this data to make predictions or decisions.

## 2. Types of machine learning

There are three main types of machine learning:

* **Supervised learning:** The algorithm is trained on labeled data, which means that each data point is tagged with the correct output.
* **Unsupervised learning:** The algorithm is trained on unlabeled data, and it must find patterns in the data on its own.
* **Reinforcement learning:** The algorithm learns by trial and error, and it is rewarded for making correct decisions.

## 3. Applications of machine learning in finance

Machine learning has a wide range of applications in finance, including:

* **Algorithmic trading:** Machine learning algorithms can be used to develop and execute trading strategies.
* **Fraud detection:** Machine learning algorithms can be used to detect fraudulent transactions.
* **Credit scoring:** Machine learning algorithms can be used to assess the creditworthiness of borrowers.
* **Portfolio management:** Machine learning algorithms can be used to optimize investment portfolios.

## 4. Building a machine learning model

The process of building a machine learning model typically involves the following steps:

1.  **Data collection:** The first step is to collect the data that will be used to train the model.
2.  **Data preparation:** The data must be cleaned and preprocessed before it can be used to train the model.
3.  **Model selection:** The next step is to select the appropriate machine learning model for the task.
4.  **Model training:** The model is trained on the data.
5.  **Model evaluation:** The model is evaluated on a test set of data to assess its performance.
6.  **Model deployment:** The model is deployed to production.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Load the data
data = pd.read_csv('https://raw.githubusercontent.com/adamvangrover/craft/main/Interactive_Notebooks/Financial_Modeling/data/mega_cap_equity_data.csv')

# Prepare the data
X = data[['Market Cap', 'PE Ratio']]
y = data['Sector']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Select and train the model
model = LogisticRegression()
model.fit(X_train, y_train)

# Evaluate the model
accuracy = model.score(X_test, y_test)
print('Accuracy:', accuracy)