# Machine Learning - Lab 1
This lab provides you with basic essentials of:
1. Setting up your development environment (virtual environment)
2. Python and Libraries



# Introduction to Conda

## What is Conda?
Conda is an open-source package management system and environment management system. It simplifies the installation and management of software packages and their dependencies for Python projects.

## Why use Conda?
Conda allows you to create isolated environments for different projects, ensuring that each project has its own dependencies and packages without affecting others.

## Task
1. Install Conda by following the instructions on the official website: [Conda Installation Guide](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html).
2. Verify the installation by running `conda --version` in your terminal.

## Assessment Question
- Why is it beneficial to use Conda for Python environment management, especially in machine learning and data science projects?


In [None]:
Conda is useful in machine learning and data science because it creates isolated environments, manages both Python and system dependencies, and supports reproducibility. This makes it easier to handle complex setups and avoid version conflicts across different projects.



# Setting up a Python Environment with Conda

## Creating and Managing Environments
Conda allows you to create separate environments for different projects, each with its own set of packages.

## Task
1. Create a new Conda environment named 'ml_lab1' with Python 3.8: `conda create --name ml_lab1 python=3.8`
2. Activate the environment: `conda activate ml_lab1`
3. Install NumPy, Pandas, Matplotlib, and SciPy in the environment.

## Assessment Question
- Describe the steps to create a new environment in Conda and install a specific version of Python and a few packages.


In [None]:
To create a new Conda environment with a specific Python version, use `conda create --name myenv python=3.8`, then activate it with `conda activate myenv`. 
Install packages with `conda install package_name`, and check installed packages with `conda list`.


# Introduction to Python Basics

Python is a popular programming language for data analysis and machine learning. In this section, we'll cover basic Python syntax and data types.

## Task
Write a Python script that takes a list of numbers and returns a list containing only the even numbers.

## Assessment Question
- Explain the difference between a list and a tuple in Python.


In [None]:
Task:
def get_even_numbers(numbers):
    return [num for num in numbers if num % 2 == 0]

Example:
numbers = [1, 2, 3, 4, 5, 6]
even_numbers = get_even_numbers(numbers)
print(even_numbers)

Differences between a list and a tuple:
In Python, lists are mutable, meaning their elements can be modified, added, or removed, whereas tuples are immutable and cannot be altered after creation. 
Lists are defined using square brackets `[]`, while tuples use parentheses `()`. 
Lists are ideal for data that might change, while tuples work well for data that should remain constant.


# Working with NumPy

NumPy is a fundamental package for scientific computing with Python. It provides support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

## Task
Create a NumPy array and perform basic array operations (addition, multiplication).

## Assessment Question
- What are the advantages of using NumPy arrays over standard Python lists for numerical data?


In [None]:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
arr_addition = arr + 5  
arr_multiplication = arr * 2  
print("Array after addition:", arr_addition)
print("Array after multiplication:", arr_multiplication)

Advantages od using Numpy arrays over standard python lists for numerical data
NumPy arrays are more efficient than Python lists for numerical tasks, as they consume less memory and perform operations faster. 
They also support element-wise operations, allowing for quick calculations across entire arrays without needing explicit loops, making them ideal for mathematical computations.





# Data Manipulation with Pandas

Pandas is an open-source data analysis and manipulation tool, built on top of the Python programming language.

## Task
Read a CSV file into a DataFrame, display the first few rows, and filter the data based on a condition.

## Assessment Question
- Describe how a Pandas DataFrame is different from a NumPy array.


In [None]:
import pandas as pd
df = pd.read_csv('your_file.csv')
print(df.head())
filtered_df = df[df['age'] > 30]
print(filtered_df)

A Pandas DataFrame is a labeled, two-dimensional data structure that allows for columns of different types and provides extensive data manipulation capabilities, while a NumPy array is a homogeneous, 
multi-dimensional array optimized for numerical operations. Unlike NumPy, Pandas handles missing data and flexible indexing, making it better suited for structured data with mixed types.






# Data Visualization with Matplotlib

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.

## Task
Create a basic plot (line plot or bar chart) using data from a Pandas DataFrame.

## Assessment Question
- Why is data visualization important in data analysis?


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
data = {'Category': ['A', 'B', 'C', 'D'],
'Value': [10, 20, 15, 25]}
df = pd.DataFrame(data)
df.plot(kind='bar', x='Category', y='Value')
plt.title('Bar Chart of Categories')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()
df.plot(kind='line', x='Category', y='Value', marker='o')
plt.title('Line Plot of Categories')
plt.xlabel('Category')
plt.ylabel('Value')
plt.show()

Data visualization importance in data analysis
Data visualization is crucial because it allows for the effective communication of complex information, making patterns, trends, and insights more accessible and understandable. 
It helps in identifying outliers, comparing data, and making data-driven decisions more intuitive.



# Basic File Operations in Python

File operations are fundamental for data processing. In this section, we'll cover how to read from and write to files in Python.

## Task
Write a Python script to read a text file and count the frequency of each word in the file.

## Assessment Question
- Discuss the different modes of opening a file in Python.


In [None]:
from collections import Counter
import re
with open('your_file.txt', 'r') as file:
text = file.read()
words = re.findall(r'\b\w+\b', text.lower())
word_counts = Counter(words)
for word, count in word_counts.items():
print(f'{word}: {count}')

Different modes of opening a file in python:
'r': Read only.
'w': Write (overwrite or create).
'a': Append (write to end).
'b': Binary mode.
'x': Exclusive create (error if exists).
'r+': Read and write.
'w+': Write and read (overwrite or create).
'a+': Append and read.






# Introduction to SciPy

SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering.

## Task
Demonstrate a simple mathematical operation using SciPy, like solving a linear equation.

## Assessment Question
- What is the role of SciPy in scientific computing with Python?


In [None]:
from scipy.linalg import solve
A = [[3, 2], [1, 2]]
b = [5, 4]
x = solve(A, b)
print('Solution:', x)

Role of SciPy in scientific computing in python:
SciPy enhances Python's capabilities by offering tools for complex mathematical computations, including optimization, integration, interpolation, and solving linear algebra problems. 
It builds on NumPy, providing more advanced functionalities for scientific and engineering tasks.





## Submission
Submit a link to your completed Jupyter Notebook file hosted on your private GitHub repository through the submission link in Blackboard.