In this module, you'll complete a guided exploratory data analysis project, then conduct a second, unique data analysis/exploration project. the goal is to tell an unique and compelling story with data. Not just your analytical compatibilities, but your ability to communicate in a professional and engaging manner is key.
This module requires the skills learned in previous chapters. The first, guided exploratory data project focuses on diamonds.csv and is based on in Exercise 9.16 beginning on page 352 of the text. The second is a project of your choice, related to your domain.
- Create a new GitHub repo named datafun-06-projects.
- Git clone your new repo into your Documents folder.
- Always ensure your repo has the 3 basic files all our repos need:
- a good README.md,
- .gitignore, and
- about.py.
- Copy these from previous repos as needed.
- Update README.md to reflect the focus of this module.
Read the exercise. Begin considering what you'd like your second project to focus on / showcase.
Use your second project to show all the Python things you know.
- Read from a data file.
- Use statistics - mean, median, mode, standard deviation, variance for one or more of the numerical columns.
- Use built-in functions min(), max(), len(), count of records, number of columns, others...
- Create some custom functions, use some branching logic to transform and/or show only a part of the data.
- Get some of your data into a list.
- Use filter(), map(), and list comprehensions to clean and transform the data.
- Use pandas
- Use matplotlib hist
- Strive to "tell a story" with data. Use a good title section and useful section headings to professionally present your work.
- Follow the instructions for Exercise 9.16 (starting pg. 350).
- Complete the exercise in a notebook.
- 1-Load: Get the file, store it in your repo, and load it into a DataFrame.
- 2-View: Display the first 7 rows and the last 7 rows.
- 3-Describe: Use the DataFrame describe() function to calculate basic descriptive statistics for all numeric columns.
- 4-Series: Use the Series method describe() to calculate the descriptive stats for all category/text columns.
- 4-Unique: Use the Series method unique() to get unique category values.
- 5-Histograms: Use the DataFrame's hist() function to create a histogram for each numerical column.
- Required: Use Section headings in your Markdown to make it clear that each of these sections are shown in your notebook. They should be numbered 1-5 and include the keyword shown above.
- Required: Include the title of the notebook, and your name and date at the top.
- Do these consistently. A heading and section titles is required in every notebook.
Document your results.
- execute the completed notebook
- export to html
- include the html in your repo
- Use everything you've learned to conduct a unique data exploration project using some information related to your domain.
- Tell a story with data (do a web search to learn more).
- Use this project to feature all of the key skills learned - creating a professional notebook, writing a good README.md (do a web search).
- Include challenging Python programming aspects - find a reason to use filter(), map(), and list comprehensions.
- Have fun and make it unique.
Document your results.
- execute the completed notebook
- export to html
- include the html in your repo
Use VS Code to commit and sync (push) your repo to GitHub - or in Git Bash or terminal, do the following.
git add . git commit -m "added code" git push origin main
- As part of your second project, include a new library or module we won't have time to explore.
- Consider imageio, nltk, texatistic, textblob, wordcloud, or others.
- Basically, look for something that might interest you and see if you can learn it on your own and apply it to your domain/project.
- How comfortable are you starting a project in GitHub, cloning it down, exploring data, and getting it back into GitHub?
- What parts are still too challenging to be enjoyable?
- Add your suggestions in the discussion forum and we'll see if we can't clear up any issues so you feel ready to complete data analytics projects in Python on your own.
- Paste a clickable link to your public GitHub repo:
- Your domain:
- About how long did you spend on class this module:
- In general, how did it go:
- What was the most difficult part:
- What was most interesting:
- Did you do the optional bonus (y/n). How did it go - or why not?
From the Module Overview, paste the numbered list of objectives and assess your ability on each as "Highly proficient", "Proficient", or "Not Proficient":
At the end of this module students will be able to:
- Research tools like pandas, matplotlib, and seaborn (L02)
- Perform a guided exploratory data analysis project (L02)
- Plan and conduct a new data analytics project (L02)
- Ingest data (L02)
- Explore data (L02)
- Calculate descriptive statistics on data (L02)
- Visualize data and results (L02)
- Apply Python to achieve unique objectives (L02)
- Create and manage git repositories (L02)
- Employ git clone, git add, git commit, and git push - either through an IDE or at the command line (L02)
- Tell a story with data in a unique and compelling way. (L02)
- Communicate professionally (L02)
- Get Started
- Task 1. Begin with the End in Mind
- Task 2 - Diamonds Dataset
- Task 3 Output
- Task 4. Push Repo to GitHub
- Optional Task 5. Bonus
- Reflection (on your own)
- Submission Instructions
- Submit
- Part 1 - Project
- Part 2 - Self Assessment