# Jupyter Notebook Setup and Machine Learning Workflow
### Tutorial for COMP4388 Assignment
This notebook will guide you through:
1. Setting up and using Jupyter Notebook.
2. Working with essential Python libraries: Pandas, Matplotlib, and Scikit-learn.
3. Implementing tasks from the project.

---

## Section 0: Introduction to Jupyter Notebook and Python
Jupyter Notebook is an open-source web application that allows you to create and share documents containing live code, equations, visualizations, and narrative text. It's a powerful tool for data analysis and machine learning.

### Options for Running Jupyter Notebook
1. **Local Installation**: Install Jupyter Notebook on your computer using Anaconda or pip. and program using vscode or pycharm.
2. **Google Colab**: Use Google Colab, a free cloud service with GPU support. It's a great option if you don't want to install anything on your computer.
3. **Kaggle Kernels**: Use Kaggle Kernels, another cloud-based platform for data science and machine learning.

### Notebook Cells
- **Code Cells**: Write and execute Python code.
- **Markdown Cells**: Write text using Markdown syntax.

### Keyboard Shortcuts
- **Run Cell**: Shift + Enter


---


### Python Basics
You can learn Python basics from the following resources:
- [Python Tutorial (w3schools)](https://www.w3schools.com/python/)
- [Python Documentation](https://docs.python.org/3/tutorial/index.html)
- [Python for Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/)

## Section 1: Install Required Libraries (Can be skipped if using Google Colab)
Use the commands below to install libraries.
`pip install notebook pandas matplotlib scikit-learn seaborn`

## Section 2: Key Libraries and Their Uses
We'll use:
- `pandas` for data analysis and manipulation.
- `matplotlib` and `seaborn` for data visualization.
- `scikit-learn` for machine learning.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


## Section 3: Working with the Dataset
### Load the Dataset
We'll load the dataset and perform basic exploration.

In [4]:
# Load the dataset
data = pd.read_csv('Customer Churn.csv')

# Show the first few rows
print("Dataset Overview:")
data.head()

Dataset Overview:


Unnamed: 0,ID,Call Failure,Complains,Charge Amount,Freq. of use,Freq. of SMS,Distinct Called Numbers,Age Group,Plan,Status,Age,Customer Value,Churn
0,1,3,no,100,25,32,11,3,pre-paid,active,30,193.12,no
1,2,8,no,100,65,0,13,2,pre-paid,active,25,194.4,yes
2,3,0,no,200,0,0,0,2,pre-paid,not-active,25,0.0,yes
3,4,10,no,100,54,327,20,2,pre-paid,active,25,1579.14,yes
4,5,10,no,100,60,0,31,1,pre-paid,active,15,227.865,yes


In [ ]:
# Display dataset shape
print("\nDataset Info:")
data.info()

In [ ]:
# Check for missing values
print("\nMissing Values:")
print(data.isnull().sum())

## Section 4: Exploratory Data Analysis (EDA)
### 1. Summary Statistics

In [None]:
# Display summary statistics
data.describe()

### 2. Visualizing Class Distribution
Let's see the churn distribution.

In [None]:
# Churn distribution
sns.countplot(x='Churn', data=data)
plt.title('Churn Distribution')
plt.show()

### 3. Histograms for Churn by Age Group and Charge Amount

In [None]:
# Age group histogram
sns.histplot(data=data, x='Age', hue='Churn', multiple='stack', kde=False)
plt.title('Age Group vs Churn')
plt.show()

# Charge amount histogram
sns.histplot(data=data, x='TotalCharges', hue='Churn', multiple='stack', kde=False)
plt.title('Charge Amount vs Churn')
plt.show()

## Section 5: Correlation Heatmap
Analyze feature correlations.

In [None]:
# Correlation heatmap
corr_matrix = data.corr()
sns.heatmap(corr_matrix, annot=True, fmt=".2f", cmap="coolwarm")
plt.title("Feature Correlations")
plt.show()

## Section 6: Splitting Data for Training and Testing
We'll prepare the data for machine learning.

In [None]:
# Split the dataset into training (70%) and testing (30%)
X = data.drop(['Churn'], axis=1)
y = data['Churn']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

print(f"Training Set: {X_train.shape}")
print(f"Testing Set: {X_test.shape}")

## Section 7: Linear Regression Example
Train a regression model on the data.

In [None]:
# Train a Linear Regression Model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions and evaluate
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)

print(f"Mean Squared Error: {mse}")

## Section 8: Next Steps
1. Extend the notebook to include classification models.
2. Use additional visualizations for better insights.
3. Export results and include explanations for your report.

---

Thank you for following this tutorial!