This repository contains a collection of data analysis and machine learning projects. Each project is presented in a Jupyter Notebook (.ipynb
) and uses various datasets (.csv
files).
Here is a brief description of the datasets used in these projects:
iris.csv
: A classic dataset in machine learning, containing measurements of iris flowers and their species.salesdata.csv
: Sales data from a retail company.TitanicData.csv
: The famous Titanic dataset, containing information about passengers and their survival status.wbcd.csv
: Wisconsin Breast Cancer Dataset, used for breast cancer diagnosis.winequality_red.csv
: Data on red wine, including chemical properties and quality ratings.winequality_white.csv
: Data on white wine, including chemical properties and quality ratings.Global Health Statistics.csv
: A large dataset with global health statistics. Due to its size, it is not loaded in the notebooks.
The following is a list of the Jupyter Notebooks in this repository, with a brief description of their content:
LAB01.ipynb
: Introduction to data analysis with Python, using libraries likepandas
andnumpy
.LAB02.ipynb
: Data visualization withmatplotlib
andseaborn
.LAB03.ipynb
: Data cleaning and preparation techniques.LAB04-1.ipynb
,LAB04-2.ipynb
,LAB04.ipynb
: Introduction to machine learning with scikit-learn.LAB05-1,2,3.ipynb
,LAB05-4.ipynb
: Implementation of various machine learning models.LAB06-5.ipynb
,LAB06.ipynb
: Advanced machine learning concepts, including model evaluation and hyperparameter tuning.LAB07-1,2,3.ipynb
,LAB07-4.ipynb
: Unsupervised learning techniques, such as clustering.LAB08.ipynb
: Natural Language Processing (NLP) with text data.LAB09.ipynb
: Advanced machine learning topics.SGD_01.ipynb
: A notebook focused on the Stochastic Gradient Descent (SGD) algorithm.
To run these notebooks, you will need to have Python and Jupyter Notebook installed, as well as the libraries mentioned in each notebook (e.g., pandas
, numpy
, scikit-learn
, matplotlib
, seaborn
).