Assignment project for Programming for Data Analysis module at GMIT, 2019.
Lecturer: dr Brian McGinley
Author: Andrzej Kocielski
Github: andkoc001
Email: G00376291@gmit.ie, and.koc001@gmail.com
Created: 16-11-2019
This is my assignment project to Programming for Data Analysis module, Galway-Mayo Institute of Technology, 2019.
This GitHub repository documents my research, project progress (git version control) and findings of the meteorites data synthesis.
The project concerns a creation of a model of a real-world phenomenon. It involves subject research, identification of key parameters and variables affecting the phenomenon, as well as relationship between them. The projects involves also running a simulation of the model and synthesising the the variables in a dataset.
The high level objectives of the project are as follows.
- To choose a real-world phenomenon that can be measured,
- Investigate variables - distributions and relationship with each other,
- Simulate / synthesise a dataset describing the phenomenon - Jupyter Notebook,
- Document the research and the data synthesis.
The detailed project brief, see references section below.
This project is intended to further familiarisation with data analytics. The project will allow to get practical understanding of handling data in Python environment. It is also intended to get familiar with the analytical tools. The tools used in the project include Python language with additional libraries like Pandas, Numpy, Seaborn, etc. as well as Jupyter Notbook.
Image source: Wikipedia
I have chosen to research and analyse the meteorites fall phenomenon. It occurs when meteoroid hit another celecstial objects (e.g. planet). The variables taken into account in the simulation are:
- atmospheric entry,
- surface impact,
- meteorite mass,
- meteorite velocity,
- crater size,
- class.
More details can be found in the Notebook.
The project is delivered via this GitHub repository.
This README.md file contains background information and introduction to the project. It should be read in conjunction with the Jupyter Notebook data_synth_1.ipynb, where the data synthesis is conducted.
In the notebook I have incorporated the research and described the project progress. It is illustrated the applied concepts and methods together with relevant code snippets. The notebook includes also the calculated outputs and plots with accompanied description. Finally, inside the notebook I have included also references to sources being consulted for this assignment.
For viewing the notebook online, it is recommended to use Jupyter Notebooks viewer, nbviewer. Paste the link to the notebook to be inspected into provided field.
Specific tools that aided the delivery of the project were:
- Python programming language, which is acclaimed for its capacity of handling large amount of data in scientific community of different specialisation. Its natural functionality has been extended by development of external libraries dedicated for specific purposes. Throught the project I used the following libraries:
- NumPy - used for scientific calculations in Python; it allows, among others, to perform numerical calculations or random numbers generation.
- Pandas - used for data analysis and provides functionalities and data structures needed to work with structured datasets.
- SciPy - a collection of routines for linear algebra, statistics and other numerical applclications.
- Matplotlib - for producing plotting.
- Seaborn - used for data visualization based on matplotlib.- The project was conducted in Jupyter Notebook environment, that provides interactive, web-based environment for data science and scientific computing.
General, high-level, reference sources are listed below. References to specific problems are included in the Notebook.
- McGinley, B., Programming for Data Analysis - Project Brief 2019. [pdf] GMIT. Available at: https://github.com/brianmcgmit/ProgDA/raw/master/ProgDA_Project.pdf [Accessed November 2019].
- Programming for Data Analysis - module webpage. [online] Learnonline.gmit.ie. Available at: https://learnonline.gmit.ie/course/view.php?id=1127 [Accessed December 2019].
- Python 3 Documentation. [online] Available at: https://docs.python.org/3/ [Accessed November 2019].
- Scipy - Scientific computing tools for Python. [online] Available at:https://www.scipy.org/about.html [Accessed December 2019].
- Project Jupyter Documentation. [online] Available at: https://jupyter.org/documentation [Accessed November 2019].
- Center for Near Earth Objects Studies, NASA. [online] Available at: https://cneos.jpl.nasa.gov/ [Accessed November 2019].
- Meteorite - Wikipedia. [online] Available at: https://en.wikipedia.org/wiki/Meteorite [Accessed November 2019].
- Impact event - Wikipedia. [online] Available at: https://en.wikipedia.org/wiki/Impact_event [Accessed November 2019].
Andrzej Kocielski