- Interactive app: cintel-03-data - see MT Cars tab and Penguins tab
- Repository: cintel-03-data
- Author: Wade Bryson
This time, we add data. Two very common types of data are Excel data and CSV (comma-separated values) data. Python has libraries to read and write both types of data - and more.
We need to understand data before using it. What columns are included? What data types are used? Are there any missing values? Are there any outliers? Dates can be challenging and may need to be converted to a standard format for processing.
Inspect the data first. CSV files are easy to open in VS Code. Excel files are easy to open in Excel.
We'll work with two datasets:
Copy this starter repository into your own GitHub account by clicking the 'Fork' button at the top of this page.
- Open VS Code and from the menu, select View / Command Palette.
- Type "Git: Clone" in the command palette and select it.
- Enter the URL (web address) of your forked GitHub repository (make sure it contains your GitHub username - not denisecase).
- Choose a directory on your local machine (e.g., Documents folder) to store the project.
- If prompted, sign in to GitHub from VS Code.
With your repository folder open in VS Code:
- Click on this README.md file for editing.
- Update the README.md file by changing your name in the author link above
- Update the links in the README.md file to your username instead of denisecase.
- After making changes, you want to send them back to GitHub
- In VS Code, find the "Source Control" icon and click it.
- Important: Enter a brief commit message describing your changes.
- Change the "Commit" button dropdown to "Commit and Push" to send your changes back to GitHub.
Before making any changes to the code, run the example app and deploy it to shinyapps.io like we did before.
- Read SHINY.md to create and deploy the example app.
- Review the folders and files in this repo.
- Compare them to earlier projects.
- Notice what changed and what remains constant.
The consistent parts are 'boilerplate'. We use boilerplate code a lot. To be productive quickly, focus on the parts that change.
Don't be concerned by the large number of files. Each has a relatively small, specific purpose. Data analysts often use standard ways of organizing their work.
🚀 Rocket Tip: Check out Cookiecutter Data Science - one of many cookie cutter templates recommending reusable project structures.
To see the app running locally, see the screencast.