# Module overview (Introduction)

_This notebook provides key resources to get you started._

Note: This Jupyter Notebook was originally compiled by Alex Reppel (AR) based on conversations with [ClaudeAI](https://claude.ai/) *(version 3.5 Sonnet)*. For this year's materials, further revisions were made using [Claude Code](https://www.anthropic.com/claude-code) *(Opus 4.1)*, including updated documentation and git commit messages.

## Recommendations

If you have not yet been able to install [Anaconda](https://www.anaconda.com/) on your personal computer, you can try [Google Collab](https://colab.research.google.com/notebooks/intro.ipynb) to test notebooks.

### Books, videos, and websites

These are optional resource that I have found useful. If you are familiar with basic Python and/or have access to good tutorials, there is no need for you watch any of the videos. I am sharing these resources in the hope that you find them as useful as I did (and still do!).

- [My favourite introductory book on Python](https://www.manning.com/books/get-programming) _(Ana Bell)_
- [My favourite video tutorial series on Python](https://www.youtube.com/user/schafer5) _(Corey Schafer)_
- [The open-edition of "Python for Data Analysis"](https://wesmckinney.com/book/) _(Pandas team)_
- [My favourite book on Python / data analysis](https://www.penguinrandomhouse.com/books/669536/introduction-to-computation-and-programming-using-python-third-edition-by-john-v-guttag/) _(John Guttag)_

While there is also no need for you to purchase these books, I am listing them here for those really struggling and looking for a very easy introduction -- Ana Bell's book is brilliant for that. John Guttag's book is much more advanced, but perhaps the best introduction on analytics using Pyton for anyone wishing to go beyond what we're doing in the module.

### Tutorials

There are also many good tutorials available, including:

- Pandas for beginners: [Python Pandas Tutorial -- A Complete Introduction for Beginners](https://github.com/LearnDataSci/articles/blob/master/Python%20Pandas%20Tutorial%20A%20Complete%20Introduction%20for%20Beginners/notebook.ipynb) _(LearnDataSci)_
- Introduction: [Pandas tutorial: Introduction to data manipulation and analysis with Pandas](https://colab.research.google.com/github/ffraile/computer_science_tutorials/blob/main/source/Data%20Manipulation/tutorials/Pandas%20tutorial.ipynb) _(Google Colab)_
- [Python Pandas Tutorial: A Complete Guide](https://datagy.io/pandas/) _(Datagy)_
- Pandas for more advanced users: [From Good to Great Data Science, Part 1 Correlations and Confidence](https://github.com/LearnDataSci/articles/blob/master/From%20Good%20to%20Great%20Data%20Science%2C%20Part%201%20Correlations%20and%20Confidence/notebook.ipynb) _(LearnDataSci)_
- [The Ultimate Guide to the Pandas Library for Data Science in Python](https://www.freecodecamp.org/news/the-ultimate-guide-to-the-pandas-library-for-data-science-in-python/) _(freeCodeCamp)_

### Datasets

#### Data source for this year's group assessment *(default)*

- [120 years of Olymic history](https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results)

#### Alternative data sources

For small(er) projects:

- [World Countries](https://stefangabos.github.io/world_countries/) *(see also [this link](https://www.kaggle.com/datasets/fernandol/countries-of-the-world) on Kaggle)*
- [CIA World Factbook](https://www.cia.gov/the-world-factbook/countries/) *(for web scraping?)*

For potentially more ambitious projects:

- [World Bank Economic Indicators](https://data.worldbank.org/indicator)
- [IMF World Economic Outlook Database](https://www.imf.org/en/Publications/WEO/weo-database/2023/October)
- [UN Comtrade Database](https://comtrade.un.org/data/)

Repositories where you will find many more datasets:

- [The R Datasets Package](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html)
- [Kaggle datasets](https://www.kaggle.com/datasets)

### Example projects

You find a lot of example projects on sites such as Kaggle:

- [Data Cleaning Challenge: Handling missing values](https://www.kaggle.com/code/rtatman/data-cleaning-challenge-handling-missing-values)
- [Text Data Cleaning - tweets analysis](https://www.kaggle.com/code/ragnisah/text-data-cleaning-tweets-analysis)

## Guidance on generative AI

The University [states](https://intranet.royalholloway.ac.uk/staff/teaching/referencing.aspx) that "[s]tudents are required to provide a statement within any assignment submission that has used generative AI, clarifying their use of the tool." As instructors, we are required to include the following to the assessment brief.

**If you have used a generative AI tool to prepare your assignment(s) for this module, you must include:**

1. Name, version (if available), and provider of the generative AI tool used *(e.g. Copilot, Microsoft)*
2. URL of the tool used (e.g. https://copilot.microsoft.com)
3. A short description of how the generative AI tool was used in the assignment

The University states further that, "[w]hen including an academic reference for a piece of generative AI-produced content in your reference list, please include the name (and version if available) of the tool, and the date the tool was accessed (e.g. Copilot, Microsoft, accessed 3rd June 2024). For an in-text citation, please name the tool and the year accessed (e.g. Copilot, 2024)."

My aim for this module is to explore the use of generative AI together, and to learn how to put these requirements into practice. For example, while I do not use generative AI for academic publications _(as most publishers prohibit the use of generative AI)_, I have used [Claude AI](https://www.claude.ai/) to prepare session materials for this module.