This project demonstrates an end-to-end Data Engineering and Machine Learning workflow using Python. It covers:
- Data Exploration: Using
pandasto inspect, clean, and aggregate student performance data. - Data Visualization: Creating insightful charts using both
matplotlibandseabornto understand feature distributions and relationships. - Machine Learning: Building, evaluating, and interpreting a Logistic Regression model using
scikit-learnto predict student success (Pass/Fail).
part4_visualization_ml.ipynb: The main Jupyter Notebook containing all analysis, visualizations, and ML code.students.csv: The dataset containing student grades, attendance, and study hours.*.png: Various plot images generated during the visualization tasks.
- Clone the repository.
- Ensure you have the required libraries installed (
pip install pandas matplotlib seaborn scikit-learn). - Open
part4_visualization_ml.ipynbin Jupyter Notebook or VS Code and run the cells sequentially.