Python Data Analysis Course 11.2019
Build a complete data analysis pipeline using Python ecosystem
- Define the problem
- Gather the raw data
- Process (clean) the data
- Explore
- Analysis (apply models, make predictions)
- Reports and Visual Results in a form understandable to stakeholders
- Git and Github
- short intro to command line
- Text Editors
- Anaconda
- cloud based tools (Google Colab, myBinder, etc)
- basic data types
- working with compound data(slicing)
- structure (functions, classes, )
- program flow (conditionals)
- input/output
- importing external libraries
- introduction to NumPy, Pandas
- web scraping with Selenium, Beautiful Soup
- using APIs
- reintroduction to SQL databases
- ACID compliance
- NodeJS
- MongoDB
- other NoSQL databases
- The 4 Vs - (volume, variety, velocity, veracity)
- Apache Hadoop Ecosystem
- Apache Lucene -> Elasticsearch
- advancing your NumPy, Pandas skills
- Pandas, matplotlib etc
- Graph Analysis (Network Analysis)
Note: ML section may be expanded if good progress is made in other sections :)
- test/train data
- supervised/unsupervised learning
- classifiers
- regressors
- scikit-learn
- TensorFlow with Keras
- PyTorch
- Tesseract for OCR
- PowerBI OR Tableau
- Python visualization libraries (mathplotlib, Seaborn)
- Graphviz
- Dash/Plotly
- PDF processing
- PyQT
- nltk
- Course Project
Tools of the trade:
Anaconda Distribution(Python, R and more) https://www.anaconda.com/download/