Author: Caio Forte Ribeiro
This repository contains all files required as part of the assessment for the module Fundamentals of Data Analysis at GMIT (Winter 2021-22). All code was written in Python (v. 3.8.8, realeased on Feb 19, 2021).
Here you will find the following files:
cao.ipynb
, a Jupyter notebook with a comparison of CAO points for 2019 to 2021.pyplot.ipynb
, a Jupyter notebook with an explanation of thematplotlib.pyplot
package and examples of 3 plots using it.requirements.txt
, containing all modules required to run the notebooks.- a
data
folder with all input and output datasets for the CAO notebook.
Create a repository containing 2 Jupyter notebooks:
cao.ipynb
, with a clear and concise overview of how to load CAO points information from the CAO website into apandas
data frame, a comparison of CAO points for 2019, 2020, and 2021, as well as relevant and appropriate plots.pyplot.ipynb
, with a clear and concise overview of thematplotlib.pyplot
Python package,and an in-depth explanation of 3 interesting plots from the package.
The repository should also contain all relevant data and a requirements.txt
file to enable someone to run the notebooks with minimal configuration.
This notebook starts by pulling the data for each year from the CAO website, which comes in different formats (.xlsx
for 2021 and 2020, .pdf
for 2019) and saving a time-stamped local copy of the original data, before loading them into a pandas
dataframe using pd.read_excel()
for 2021 and 2020 and camelot
for 2019. Each dataframe is then cleaned-up (keeping only L8 courses, removing string characters from columns with points, and keeping track of which courses had additional entry requirements) and saved as a csv
file into the data folder.
With all clean dataframes ready, we join them into a consolidated dataframe allcourses
to perform our comparative analysis (higher entry scores, higher Mid scores, entry scores above 600 per year) and plottings.
This notebook starts by describing what is the matplotlib.pyplot
package and what it is used for. It then runs through some basic plots and some of the basic pyplot
functions before presenting 3 examples of plots done with the package:
- Scatterplots: We used the data from the Iris dataset to plot a series of scatterplots of Iris petal and sepal sizes. Here, we also explored the
fig,ax=subplots()
method for plotting. - Pie charts: A pie chart of the time spent on each activity of a project, using
plt.pie()
andautopct
to display percentages. - Boxplots: Again, using data from the Iris dataset to plot:
- Petal and sepal sizes irrespective of the species.
- Petal and sepal sizes according for each species.
Quick note: Before starting with specific requirements, if you're on Windows we recommend that you install a third-party console emulator, such as Cmder. Use the command line interface if on Linux or a Mac.
To run the Jupyter notebook, you will first need to:
- Have Python installed in your machine. You can download the latest version here.
- Install Jupyter. This can be done with one of the following 2 methods:
- Using
pip
:
pip install jupyterlab
- Installing Anaconda, which comes with Jupyter pre-installed.
- Using
- Install
Camelot
to load CAO 2019 data from a PDF table intopandas
. After installing the dependencies Ghostscript and ActiveTlc, type the following in your console:conda install -c conda-forge camelot-py
You can find specific requirements for running the code, such as additional modules to import, in the requirements.txt
file in this repository.
- Clone this repository into your local machine.
- Open Cmdr (or the command line interface if using Linux or Mac).
- In the terminal, type:
jupyter lab
- The Jupyter application should open in your browser. If it doesn't, click here for troubleshooting.
- Navigate through the folders and double-click the files
cao.ipynb
and (or)pyplot.ipynb
. - In the Run tab from the top-left menu, click on Run All Cells.
- If you have an issue while running the notebook, try restarting the Kernel:
- In the Run tab from the top-left menu, click on Restart Kernel and Run All Cells.
You can also use these buttons for a static view of the notebooks in nbviewer:
- CAO notebook
- Pyplot notebook
These notebooks heavily relied on the official documentation of all the packages used:
We also recommend the following sources:
Feel free to get in touch about this project by sending me an email with your suggestions.