Python is an easy-to-learn language that works great for data analysis. This repo's purpose is to showcase some of the basic abilities Python (and some of its more popular packages) have for cleaning, modeling, and organizing data.
We will be using the following packages in this repo:
- matplotlib (data visualization)
- numpy (scientific computing / data analysis)
- pandas ("spreadsheet-like" data analysis)
- plotly (data visualization)
- openpyxl (work with excel files)
Other packages that we'll use:
- re (regex)
- pprint (better terminal output formatting)
- xml/json/csv (data import for these filetypes)
- math
You can use this repo for quick examples. Some examples include:
- working with existing data using Numpy/Pandas as a tool for analysis
- storing data in powerful structures such as the Pandas
DataFrame
and Numpy'sndarray
- cleaning data
- importing data from xml, json, excel, and other file formats
- plotting to a graph using Plotly/Matplotlib
If you feel that you can improve or add something valuable to this repo then please consider contributing. Please follow the following guidelines:
- If you are working on an issue, please comment on the issue and reference it in your commit message
- Provide mock/test data if you are working on a new data analysis example