# Introduction to Machine Learning
Machine learning is a branch of artificial intelligence (AI) that focuses on building applications that can learn from data and improve their accuracy over time without being programmed to do so. 
In this notebook, we will go through the basics of machine learning, covering fundamental concepts and steps involved in building a machine learning model.

# Understanding Data
Data is the foundation of machine learning. It can be numerical or categorical. 
In this section, we will load and explore a sample dataset.

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score


In [2]:
# Load the dataset
url = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv'
data = pd.read_csv(url)
data.head()

# Data Preprocessing
Before training a model, we need to preprocess the data. This includes handling missing values, encoding categorical data, and feature scaling.

In [3]:
# Check for missing values
data.isnull().sum()

In [4]:
# Encoding categorical data
data = pd.get_dummies(data, columns=['species'], drop_first=True)
data.head()

In [5]:
# Feature scaling
scaler = StandardScaler()
data[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']] = scaler.fit_transform(data[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']])
data.head()

# Splitting Data
Splitting the data into training and test sets is a crucial step to evaluate the performance of the model.

In [6]:
# Splitting the data into training and test sets
X = data.drop('sepal_length', axis=1)
y = data['sepal_length']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train.shape, X_test.shape

# Model Selection
Choosing the right model is essential for making accurate predictions. Here, we'll use Linear Regression as an example.

In [7]:
# Define the model
model = LinearRegression()

# Model Training
Train the model using the training data.

In [8]:
# Train the model
model.fit(X_train, y_train)

# Model Evaluation
Evaluate the model's performance using metrics such as Mean Squared Error (MSE) and R-squared (R2).

In [9]:
# Make predictions
y_pred = model.predict(X_test)

# Calculate MSE and R2
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
mse, r2

# Making Predictions
Using the trained model to make predictions on new data.

In [10]:
# Example prediction
new_data = [[0.2, 0.3, 0.4, 1, 0]]  # Scaled features with one-hot encoded species
predicted_sepal_length = model.predict(new_data)
predicted_sepal_length

# Conclusion
In this notebook, we covered the basics of machine learning, including data preprocessing, model training, and evaluation. We used Linear Regression as an example model. The next steps involve experimenting with different models and datasets to further your understanding of machine learning.