# Hello Machine Learning!
**Week 1, Day 1**  
**Topic:** Introduction to ML with Scikit-learn

## Learning Objectives
1. Set up your Python environment for ML
2. Load and explore a simple dataset
3. Train your first machine learning model
4. Make predictions and evaluate the model

## Part 1: Environment Setup

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
%matplotlib inline

## Part 2: Load and Explore the Iris Dataset

In [None]:
# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data  # Features
y = iris.target  # Target variable

# Print feature names and target names
print("Feature names:\n", iris.feature_names)
print("\nTarget names:", iris.target_names)

# Print the shape of the data
print("\nShape of X:", X.shape)
print("Shape of y:", y.shape)

## Part 3: Data Visualization

In [None]:
# Create a scatter plot of two features
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis')
plt.xlabel(iris.feature_names[0])
plt.ylabel(iris.feature_names[1])
plt.title('Iris Dataset - Sepal Length vs Sepal Width')
plt.colorbar(label='Class')
plt.show()

## Part 4: Train Your First Model

In [None]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the model
model = DecisionTreeClassifier(random_state=42)

# Train the model
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model accuracy: {accuracy:.2f}")

## Part 5: Analysis and Discussion

**Question 1:** What is the accuracy of your model? Is this good or bad? Why?

*Your answer here*

**Question 2:** What are the features used in this model? Do you think using different features would improve the model?

*Your answer here*

**Question 3:** What other types of models could you try for this classification problem?

## Part 6: Additional Challenges (Optional)

1. Try using different features for classification
2. Experiment with different values for `test_size` in `train_test_split`
3. Try a different classifier from scikit-learn (e.g., `KNeighborsClassifier` or `SVC`)

## Submission Instructions
1. Complete all code and markdown cells
2. Restart the kernel and run all cells to verify they work
3. Save the notebook
4. Export as HTML/PDF (File > Export Notebook As...)
5. Submit both .ipynb and exported file to the assignment submission link