The project contains the Presentation notebooks for a Data Science Module in Milestone Institute held in 2022 in a collaboration with Wolfram Research.
The module has two main ambitious goals:
-
It aims to guide students through several model making processes, where we take real world problems from different fields, and build an approximative mathematical and/or computational model by which we predict and optimise. (An essential part of working with models, is to know their domain of validity, which will be determined critically, and sometimes extended iteratively.)
-
In this module we will work with real world (and sometimes generated) data, look it from different angles, process it, extract information by visualisation, and computation. In several cases, we will go through how to draw conclusions, optimise parameters or do predictions based on data.
For the model making processes we will usually use 4 main steps:
- Definition of the problem and asking relevant questions,
- Abstraction of the questions into computable format,
- Computation on data resulting various plots, charts, quantities, and finally
- Interpret the results and see how well we addressed the original questions and how could we go further.
The main environment for the Module will be Wolfram Language (in particular Mathematica for which a license will be provided) because of its steep learning curve, rich visualisation options and easy to access curated datasets.
However, open source softwares (Python, Sage) and environments (CoCalc) will be also introduced and students can use these to complete their projects as well.
Students of this module will strengthen their analytical skills, critical thinking, and will get a glimpse into machine learning and data science.
The Data Science Module consisted of 8 session:
- Introduction to Wolfram Language
- Mathematics: Monte Carlo integration, and the volume of high dimensional Spheres
- Physical chemistry: Effervescent Vitamins and Experimental design
- History: The Glorious Past
- Biology: DNA data
- Literature: Natural Poetry Processing
- Solar Panel Investment
- Project Presentation
Sessions from 2 to 7 contained two separate Presentation notebooks
- Mathematica .nb notebook (in Wolfram Language)
- Jupyter .ipynb notebook (in Python)
To View, Interact and Run the computational Presentation notebooks you will need to download the session folders, together with the necessary data files, and open the notebooks in a suitable environment.
For Mathematica versions you will need:
- Wolfram Engine, i.e., a Wolfram Desktop or Mathematica installation to Run
- (notebooks created in Mathamatica 13.0)
- or (a free) Wolfram Player to View and Interact
For Python versions you will need:
- Anaconda environment (recommended only)
- Python 3.9
- Jupyter notebook environment, such as:
- Further required packages installations are included inside the notebooks
- Jozsef Konczer - Initial work - Konczer
- Anita Lilla Verő - immense help in IPython notebook implementations - anitavero
See also the list of contributors who participated in this project.
This project is licensed under the MIT License - see the LICENSE.md file for details
- First of all for all contributing staff in Milestone Institute:
- Andor Kelenhegyi for allowing the Module and managing the collaboration
- Peter Symmons for indepth consultation on DNA data
- Adrian Matus for giving valuble resources for Computational Social Science
- people from Wolfram Research:
- Magali Dufour for managing the collaboration in Wolfram Research Europe Ltd
- Blaec Bejarano from SageMath for his help in CoCalc
- Krisztián Gergely for initial Scraping for Natural Poetry Processing
- and last but not least for all my Students, who with their enthusiasm and creative Final Projects compansated all the hard labour which was needed to make this project possible.