Training materials related to data science, artificial intelligence and bioinformatics. Resources which are not available for free are marked ($). You can find links to organizations which provide physical courses (in physicalcourses.md) and links to data sources (in datasources.md). Distance courses by Swedish universities which require official registration are listed in SwedishUniDistanceCourses.md.
Here is a suggested learning path for getting started in data science. Resources are below:
-
Install Anaconda (or Miniconda), get familiar with conda environments and jupyter notebooks. Alternatively, if your own computer is limited, get familiar with Google colab.
-
Learn Python basics
-
Get familiar with the main functions of python tools needed for data processing and scientific computing: regular expressions, numpy, pandas
-
Get familiar with the basics of data visualization: matplotlib
-
Get a conceptual understanding of the core principles of machine learning and deep learning and hardware basics (GPU, CPU, memory)
-
Get a basic understanding of the main machine learning libraries: pytorch and keras/tensorflow
-
Learn how to evaluate machine learning models: metrics, confusion matrices, learning curves
-
Familiarize yourself with the concepts and tools of data science reproducibility: git, FAIR principles
-
Familiarize yourself with the main concepts and tools in your main area of interest, e.g. image analysis, nlp
-
Try solving specific tasks you are interested in, e.g. from your research project or daily life, using machine learning, and just continue learning the things that are required to solve these tasks.
-
Learn about more advanced topics that suit your interests: docker/singularity, continuous integration/unit testing/build automatization, parallel programming
Anaconda installation
https://www.datacamp.com/community/tutorials/installing-anaconda-windows
- Setting up tensorflow environment https://www.anaconda.com/blog/tensorflow-in-anaconda
Jupyter notebooks
https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook
Markdown basics
https://guides.github.com/features/mastering-markdown/
Google Colab
https://web.eecs.umich.edu/~justincj/teaching/eecs498/FA2020/colab.html
Hardware recommendations
https://blog.slavv.com/picking-a-gpu-for-deep-learning-3d4795c273b9
https://timdettmers.com/2019/04/03/which-gpu-for-deep-learning/
How to think like a data scientist
https://runestone.academy/runestone/books/published/httlads/index.html
An Introduction to Statistical Learning by James, Witten, Hastie, Tibshirani
Scientific book collection by Springer, many machine learning books included
Runestone Interactive
https://runestoneinteractive.org/pages/library.html
Data8 The Foundations of Data Science course
CS109A: Introduction to Data Science
https://harvard-iacs.github.io/2018-CS109A/
CS109B: Advanced Topics in Data Science from Harvard
https://harvard-iacs.github.io/2018-CS109B/
The Coding Train youtube channel
https://www.youtube.com/user/shiffman/playlists
Corey Schafer youtube channel
https://www.youtube.com/user/schafer5/playlists
https://www.analyticsvidhya.com/blog/
Stackoverflow forum
https://nbis-reproducible-research.readthedocs.io/en/latest/
https://github.com/IFB-ElixirFr/IFB-FAIR-bioinfo-training
https://the-turing-way.netlify.app/welcome.html
https://github.com/turing-knowledge-graphs/teaching/
Software engineering best practices
https://www.pythonlikeyoumeanit.com/Module5_OddsAndEnds/Writing_Good_Code.html
https://scikit-learn.org/stable/developers/contributing.html
Pro Git book
https://git-scm.com/book/en/v2
- Installing git
https://git-scm.com/book/en/v2/Getting-Started-Installing-Git
https://docs.github.com/en/pull-requests/collaborating-with-pull-requests
https://the-turing-way.netlify.app/reproducible-research/vcs.html#rr-vcs
https://swcarpentry.github.io/git-novice/
https://realpython.com/python-git-github-intro/
https://realpython.com/advanced-git-for-pythonistas/
Git feature branch workflow
https://gist.github.com/bhpayne/a65d1f9a33daafd4afcab64614b9aaf8
http://www.continuousagile.com/unblock/branching.html
https://learngitbranching.js.org/
https://realpython.com/python-continuous-integration/
CircleCI training resources
https://circleci.com/resources/
Jenkins documentation
https://www.jenkins.io/doc/book/
flake8 - to check compatibility with python style guide
https://realpython.com/python-testing/
pytest documentation (tool for unit testing)
Official Python documentation
-
Tutorial https://docs.python.org/3/tutorial/
PEP8 python style guide
https://www.python.org/dev/peps/pep-0008/#tabs-or-spaces
Google Python style guide
https://google.github.io/styleguide/pyguide.html
ipython
scipy
numpy
-
saving numpy tutorial https://machinelearningmastery.com/how-to-save-a-numpy-array-to-file-for-machine-learning/
pandas
matplotlib
scikit-learn
scikit-image
Python for Everybody courses by University of Michigan
https://www.coursera.org/specializations/python
https://www.edx.org/bio/charles-severance
Codecademy Python course
https://www.codecademy.com/learn/learn-python
Analytics Vidhya Python course
https://courses.analyticsvidhya.com/courses/introduction-to-data-science
Google's Python class
https://developers.google.com/edu/python/
Google's Python Crash Course on Course
https://www.coursera.org/learn/python-crash-course
Corey Schaefer's Python Programming Beginner Tutorials
https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7
Dataquest Data Analyst path (some free, some $)
https://www.dataquest.io/path/data-analyst/
Lund University COMPUTE graduate school PhD course on reproducible datascience with Jupyter
https://github.com/COMPUTE-LU/jupyter-course
Freecodecamp courses
- How to Analyze Data with Python, Pandas & Numpy - 10 Hour Course
https://www.freecodecamp.org/news/how-to-analyze-data-with-python-pandas/
https://www.youtube.com/watch?v=GPVsHOlRBBI
- Python Data Science – A Free 12-Hour Course for Beginners. Learn Pandas, NumPy, Matplotlib, and More
https://www.freecodecamp.org/news/python-data-science-course-matplotlib-pandas-numpy/
https://www.youtube.com/watch?v=LHBE6Q9XlzI
- Matplotlib Course – Learn Python Data Visualization
https://www.freecodecamp.org/news/matplotlib-course-learn-python-data-visualization/
https://www.youtube.com/watch?v=3Xc3CA655Y4
Brief intro to pandas
https://levelup.gitconnected.com/20-pandas-functions-for-80-of-your-data-science-tasks-b610c8bfe63c
Python for Everybody: Exploring Data In Python 3 by Charles Severance
Learn Python the Hard Way by Zed Shaw
https://learnpythonthehardway.org/python3/
Programming Python, 4th Edition by Mark Lutz ($)
http://shop.oreilly.com/product/9780596158118.do
Learning Python, 5th Edition by Mark Lutz ($)
http://shop.oreilly.com/product/0636920028154.do
A Whirlwind tour of Python by Jake VanderPlas
for people familiar with programming
https://github.com/jakevdp/WhirlwindTourOfPython
Python Data Science Hanbook by Jake VanderPlas
https://github.com/jakevdp/PythonDataScienceHandbook
Scientific Computing with Python 3 by Claus Führer, Jan Erik Solem, Olivier Verdier ($)
https://www.oreilly.com/library/view/scientific-computing-with/9781786463517/
How to think like a computer scientist
https://runestone.academy/runestone/books/published/thinkcspy/index.html
Foundations of Python Programming
https://runestone.academy/runestone/books/published/fopp/index.html
CS109 Homework 1. Exploratory Data Analysis
https://nbviewer.jupyter.org/github/cs109/2014/blob/master/homework/HW1.ipynb
List of Python learning resources
https://forums.fast.ai/t/recommended-python-learning-resources/26888
Python NumPy tutorial
http://cs231n.github.io/python-numpy-tutorial/
Scipy tutorial
https://docs.scipy.org/doc/scipy/reference/tutorial/
Matplotlib tutorial
Pandas tutorials
https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html
http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/
https://www.analyticsvidhya.com/blog/2014/09/data-munging-python-using-pandas-baby-steps-python/
https://pandas.pydata.org/pandas-docs/stable/getting_started/tutorials.html
Lectures notes on Python
https://github.com/jrjohansson/scientific-python-lectures/tree/master/
https://github.com/NBISweden/workshop-python/tree/ht18
Peter Norvig's python training examples
https://github.com/norvig/pytudes#pytudes-index-of-jupyter-ipython-notebooks
https://www.codecademy.com/learn/learn-r
https://docs.python.org/3.6/library/re.html
https://docs.python.org/3/howto/regex.html
https://www.youtube.com/watch?v=DRR9fOXkfRE&feature=youtu.be
https://www.analyticsvidhya.com/blog/2015/06/regular-expression-python/
https://developers.google.com/edu/python/regular-expressions
https://www.debuggex.com/cheatsheet/regex/python
CS229 Machine learning course from Stanford
https://www.youtube.com/watch?v=PPLop4L2eGk&list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN
http://cs229.stanford.edu/syllabus.html
CS221 Artificial Intelligence course from Stanford
https://stanford-cs221.github.io/autumn2019/
CS230 Deep Learning course from Stanford
CS50AI Introduction to Artificial Intelligence with Python from Harvard
https://cs50.harvard.edu/ai/2020/
CS188 Introduction to Artificial Intelligence from Berkeley
https://inst.eecs.berkeley.edu/~cs188/fa20/
https://inst.eecs.berkeley.edu/~cs188/fa18/
CS294-158-SP20 Deep Unsupervised Learning from Berkeley
https://sites.google.com/view/berkeley-cs294-158-sp20/home
CSC321 Neural Networks and Machine Learning from University of Toronto
https://www.cs.toronto.edu/~lczhang/321/index.html
Machine Learning course from VU University in Amsterdam
https://www.youtube.com/watch?v=-pve3oIvxa8&index=1&list=PLCof9EqayQgupldnTvqNy_BThTcME5r93
Fast.ai courses
Google Machine Learning Crash Course
https://developers.google.com/machine-learning/crash-course/
Material from Andreas Mueller's courses
MIT Deep Learning and Artificial Intelligence Lectures
Deep RL Bootcamp (2017)
https://sites.google.com/view/deep-rl-bootcamp/lectures
Full Stack Deep Learning Bootcamp
https://course.fullstackdeeplearning.com/
Official Pytorch tutorial
https://pytorch.org/tutorials/beginner/nn_tutorial.html
Tensorboard
https://www.tensorflow.org/tensorboard
Machine learning book by Hal Daumé III
Deep Learning by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
https://www.deeplearningbook.org/
Neural Networks and Deep Learning by Michael A. Nielsen
http://neuralnetworksanddeeplearning.com/
Introduction to Deep Learning by Eugene Charniak ($)
https://mitpress.mit.edu/books/introduction-deep-learning
Deep Learning with Python by François Chollet
https://www.manning.com/books/deep-learning-with-python
Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig ($)
Hands-on Machine Learning with Scikit-Learn, Keras and TensorFlow by Aurélien Géron ($)
https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/
Notebooks for book exercises: https://github.com/ageron/handson-ml2
Reinforcement Learning, An Introduction by R. Sutton & A.G. Barto
http://incompleteideas.net/sutton/book/the-book-2nd.html (draft)
Artificial Intelligence: Foundations of Computational Agents (2nd Edition) by David L. Poole and Alan K. Mackworth
https://artint.info/2e/html/ArtInt2e.html
Machine Learning Yearning: Technical Strategy for AI Engineers, In the Era of Deep Learning by Andrew Ng
https://www.deeplearning.ai/machine-learning-yearning/
A Cookbook of Self-Supervised Learning
https://arxiv.org/abs/2304.12210
Deep Learning Tuning Playbook
https://github.com/google-research/tuning_playbook
colah's blog
- Understanding LSTMs http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Andrej Karpathy's blog
- Recipe for training neural networks http://karpathy.github.io/2019/04/25/recipe/
- The Unreasonable Effectiveness of Recurrent Neural Networks http://karpathy.github.io/2015/05/21/rnn-effectiveness/
- Deep Reinforcement Learning: Pong from Pixels http://karpathy.github.io/2016/05/31/rl/
Joyce Xu's blog
Jay Alammar's Blog and youtube channel
https://www.youtube.com/channel/UCmOwsoHty5PrmE-3QhUBfPQ
Towards Data Science
https://towardsdatascience.com/
https://towardsdatascience.com/intuitively-understanding-variational-autoencoders-1bfe67eb5daf
Overview over activation functions:
https://medium.com/@snaily16/what-why-and-which-activation-functions-b2bf748c0441
NIPS 2016 Tutorial: Generative Adversarial Networks by Ian Goodfellow
https://arxiv.org/abs/1701.00160
https://www.youtube.com/watch?v=AJVyzd0rqdc
AI Lund tv: videos from seminars and workshops @ Lund University
Pytorch tutorial by Jeremy Howard
https://pytorch.org/tutorials/beginner/nn_tutorial.html
Reports on business and societal impact of AI by McKinsey
https://www.mckinsey.com/featured-insights/artificial-intelligence
Reports on business and societal impact of AI by PWC
https://www.pwc.com/gx/en/issues/data-and-analytics/artificial-intelligence.html
Backpropagation https://www.nature.com/articles/323533a0
A Fast Learning Algorithm for Deep Belief Nets https://doi.org/10.1162/neco.2006.18.7.1527
Greedy layer-wise training of deep networks http://papers.nips.cc/paper/3048-greedy-layer-wise-training-of-deep-networks.pdf
Sequence-to-sequence learning https://arxiv.org/abs/1409.3215
Federated learning https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
Computer vision course from Stanford
https://www.youtube.com/playlist?list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv
- Image classification: https://cs231n.github.io/classification/
EECS 498-007 / 598-005: Deep Learning for Computer Vision from University of Michigan
https://web.eecs.umich.edu/~justincj/teaching/eecs498/FA2020/
https://www.youtube.com/playlist?list=PL5-TkQAfAZFbzxjBHtzdVCWE0Zbhomg7r
Lund University COMPUTE PhD course "AI in medicine and life science - AI for image and video data"
https://github.com/COMPUTE-LU/AI4MedLife_imaging_2021
Computer Vision: Algorithms and Applications by Richard Szeliski
Computer Vision - A Modern Approach by David A. Forsyth and Jean Ponce ($)
https://github.com/jbhuang0604/awesome-computer-vision
https://distill.pub/2017/feature-visualization/
https://distill.pub/2018/building-blocks/
https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html
https://github.com/jcjohnson/neural-style
Backpropagation Applied to Handwritten Zip Code Recognition (LeNet) https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwibzejJ2_7rAhUKyoUKHfrkBqIQFjABegQIAhAB&url=http%3A%2F%2Fyann.lecun.com%2Fexdb%2Fpublis%2Fpdf%2Flecun-89e.pdf&usg=AOvVaw1V9weNdZgg_6oEcKcWmdXk
VGG https://arxiv.org/pdf/1409.1556.pdf
GoogLeNet https://storage.googleapis.com/pub-tools-public-publication-data/pdf/43022.pdf
ResNet https://arxiv.org/pdf/1512.03385.pdf
Vision transformers: https://arxiv.org/abs/2010.11929
CS224n NLP course from Stanford
http://web.stanford.edu/class/cs224n/
HuggingFace NLP Course
https://huggingface.co/learn/nlp-course/
Fast.ai NLP course
https://github.com/fastai/course-nlp
https://www.youtube.com/playlist?list=PLtmWHNX-gukKocXQOkQjuVxglSDYWsSh9
Natural Language Processing from Coursera
https://www.coursera.org/learn/language-processing
Natural Language Processing from Berkeley
https://people.ischool.berkeley.edu/~dbamman//nlp23.html
Applied Natural Language Processing from Berkeley
https://people.ischool.berkeley.edu/~dbamman/info256.html
Applied Text Mining in Python from Univ. of Michigan/Coursera
https://www.coursera.org/learn/python-text-mining/home/welcome
Spacy course
AllenNLP tutorials
https://allennlp.org/tutorials
Speech and Language Processing by Dan Jurafsky and James H. Martin
https://web.stanford.edu/~jurafsky/slp3/
https://web.stanford.edu/~jurafsky/slpdraft/
Coreference chapter: https://web.stanford.edu/~jurafsky/slp3/22.pdf
Natural Language Processing by Jacob Eisenstein
https://github.com/jacobeisenstein/gt-nlp-class/blob/master/notes/eisenstein-nlp-notes.pdf
A Primer on Neural Network Models for Natural Language Processing by Yoav Goldberg
u.cs.biu.ac.il/~yogo/nnlp.pdf
Natural Language Processing with Transformers by Lewis Tunstall, Leandro von Werra, and Thomas Wolf
Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan & Hinrich Schütze
https://nlp.stanford.edu/IR-book/html/htmledition/irbook.html
Natural Language Processing with PyTorch by Brian McMahan, Delip Rao ($)
https://www.oreilly.com/library/view/natural-language-processing/9781491978221/
Data and Text Processing for Health and Life Sciences by Francisco M. Couto
http://labs.rd.ciencias.ulisboa.pt/book/
Introduction to Natural Language Processing for Text https://towardsdatascience.com/introduction-to-natural-language-processing-for-text-df845750fb63
Sebastian Ruder's blog
NLP posts on Jay Alammar's blog (https://jalammar.github.io/)
-
The Illustrated Transformer http://jalammar.github.io/illustrated-transformer/
-
The Illustrated BERT, ELMO and co http://jalammar.github.io/illustrated-bert/
Peter Bloem Transformers from Scratch http://peterbloem.nl/blog/transformers
Evaluating Text Output in NLP: BLEU at your own risk https://towardsdatascience.com/evaluating-text-output-in-nlp-bleu-at-your-own-risk-e8609665a213
Steps for effective text data cleaning (with case study using Python) https://www.analyticsvidhya.com/blog/2014/11/text-data-cleaning-steps-python/
Explanation of WordPiece tokenization and NER with BERT https://towardsdatascience.com/named-entity-recognition-with-bert-in-pytorch-a454405e0b6a
SciSpacy
https://github.com/allenai/scispacy
Python regular expressions documentation
https://docs.python.org/3/library/re.html
Tutorials about text cleaning
https://www.analyticsvidhya.com/blog/2014/11/text-data-cleaning-steps-python/
http://ieva.rocks/2016/08/07/cleaning-text-for-nlp/
https://chrisalbon.com/python/basics/cleaning_text/
http://rjweiss.github.io/text-iriss2013/
Tutorial about coreference resolution with neuralcoref
Tutorial for spacy
Tutorial for Huggingface Tokenization
Video about BERT
https://www.youtube.com/watch?v=xI0HHN5XKDo
Lars Juhl Jensen slideshare
https://www.slideshare.net/larsjuhljensen
LSTM http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.676.4320&rep=rep1&type=pdf
Google machine translation https://arxiv.org/abs/1609.08144
Transformer/Attention https://arxiv.org/abs/1706.03762
ULMFiT https://arxiv.org/abs/1801.06146
BERT https://arxiv.org/abs/1810.04805
T5 https://arxiv.org/abs/1910.10683
Speech recognition https://arxiv.org/abs/1712.01769
CS224W: Machine Learning with Graphs from Stanford
https://web.stanford.edu/class/cs224w/
Graph Neural Networks (ESE 5140) from Penn Engineering
Stanford Graph Learning Workshop 2022
https://www.youtube.com/watch?v=GYW286H3SKw
Graph Representation Learning Book by William L. Hamilton
https://www.cs.mcgill.ca/~wlh/grl_book/
Network Science by Albert-László Barabási
http://networksciencebook.com/
Networks, Crowds, and Markets: Reasoning About a Highly Connected World by David Easley and Jon Kleinberg
https://www.cs.cornell.edu/home/kleinber/networks-book/
Understanding Convolutions on Graphs
https://distill.pub/2021/understanding-gnns/
A Gentle Introduction to Graph Neural Networks
https://distill.pub/2021/gnn-intro/
A Practical Tutorial on Graph Neural Networks
https://arxiv.org/abs/2010.05234
Tutorials by Stanford CS224W students
https://medium.com/stanford-cs224w
Grad-Cam tutorial
https://www.tensorflow.org/tensorboard/what_if_tool
https://www.tensorflow.org/responsible_ai/fairness_indicators/guide
https://ai.googleblog.com/2021/08/a-dataset-exploration-case-study-with.html
Carbon Emissions and Large Neural Network Training https://arxiv.org/abs/2104.10350
GShard https://arxiv.org/abs/2006.16668
Switch Transformers https://arxiv.org/abs/2101.03961
https://blog.google/technology/ai/minimizing-carbon-footprint/
Tackling Climate Change with Machine Learning https://arxiv.org/abs/1906.05433
Codecademy SQL course https://www.codecademy.com/learn/learn-sql
Elixir training
https://tess.elixir-europe.org/
NBIS course on single-cell RNASeq
https://nbisweden.github.io/workshop-scRNAseq/
List of statistics resources
https://jvns.ca/blog/2017/04/17/statistics-for-programmers/
Immersive Maths (interactive linear algebra book)
Normalization methods explained https://towardsdatascience.com/normalization-techniques-in-python-using-numpy-b998aa81d754
Computational Linear Algebra for Coders by Fast.ai
https://github.com/fastai/numerical-linear-algebra/
3 Blue 1 Brown - animated maths
https://www.youtube.com/c/3blue1brown
Mathematics for Machine Learning
Applied Math and Machine Learning Basics chapter in Deep Learning book
https://www.deeplearningbook.org/contents/part_basics.html
Mathematical Methods for Physics and Engineering by Riley, Hobson, Bence
https://scikit-learn.org/stable/datasets/
EU Ethics Guidelines for Trustworthy AI
https://ec.europa.eu/futurium/en/ai-alliance-consultation/guidelines#Top
Multi-Task Learning in the Wilderness, Andrej Karpathy, Jun 15, 2019, ICML
https://slideslive.com/38917690/multitask-learning-in-the-wilderness
Trustworthy Human-Centric AI, Fredrik Heintz, 2020, Lund University
http://ai.lu.se/tv/trustworthy-human-centric-ai/
A conversation about AI risk and AI ethics in the age of covid-19, Jaan Tallinn and Olle Häggström
https://www.chalmers.se/en/centres/chair/news/Pages/webinar-19May2020.aspx