Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician. Josh Wills
The Darkside of that quote is real! Data scientists don't program as well as software engineers. Data scientists are also reasonably soft when it comes to understanding the larger field of statistical analysis. We can improve over time. However, our domain structure often demands that we don't specialize in the technical areas as we often scale up in other domains. If we did specialize, then we would be called statisticians or software engineers.
- 1998: Start my undergraduate at BYU
- 2000: Transfer to the University of Utah
- 2003: Undergraduate in Economics (er Socialist History) from the U.
- 2003-2005: Master’s degree in Statistics from BYU.
- 2005-2012: Statistician: Pacific Northwest National Laboratory (PNNL)
- 2012-2015: Reformed statistician: PNNL
- 2015-Current: Data Science Professor: BYU-I
- 2015-Current: Owner and Data Scientist of Data-Driven Consulting (Medical records and Child Health Analytics, Environmental Sampling, Business Consulting)
Github! Data scientists need to demonstrate their coding experience and data depth. Github provides us the social space to demonstrate these skills.
It is no exaggeration to say that git (and other forms of version control software) underlie the entire world of open-source software, and are central to the operation of nearly every tech company on the planet. ... OK, now the bad news: learning git kinda sucks. I mean, it’s not painful like performing an appendectomy on yourself without anesthesia, and it’s not hard like quantum mechanics or geometric topology; it’s definitely something anyone can learn. ref
GitHub is key to your employment as a Data Scientist.
This is GitHub, the world’s largest code repository platform online. A platform used by some 50 million software developers to host their coding projects, most of them open-source — meaning others can access their codes and modify them to create better versions if they feel like.
Most of the internet is produced or hosted on GitHub in the form of code. “What Gmail is to email, GitHub is to writing software,” says Kiran Jonnalagadda, co-founder of HasGeek, a platform to build and discover peer groups.
Read more here.
It signals that you are a programmer as well as an analyst.
Github is our version control, and we have everything on Github. Definitely having strong git experience is very helpful. The way my team is using it is through forking. We fork the main file and then pull from and to it to update the code.
Keaton Sant, Data Scientist at John Deere
Yes.
It feels weird at first but quickly becomes second nature—more bad news. Our pain will be short-lived because students primarily work in their own repositories. Do you use GitHub to work with other people or to coordinate your own work from multiple computers? If so, after you recover from the initial setup, git will crush you again with merge conflicts. And this is not one-time pain; this could be a dull ache for a long time. The best remedy is prevention, but also understanding how to back out of tricky situations and tackle them on your own terms.
Managing a project via Git/GitHub is much more like the Google Doc scenario and enjoys many of the same advantages. It is definitely more complicated than collaborating on a Google Doc, but this puts you in the right mindset. ref
- Don't post assignments
- Do post unique code and projects using skills from your classes
- Use private repos with student education account to manage your course work
- Use it to communicate
If you are trying to get a job, then your Github space should be organized. Take the time to make this space your coding ‘social media’ where people see the best side of your work.
- Make your landing page stand out by Managing your profile README. Use this guide for additional inspiration.
- Track your work and share it with the world.
- Organize and document your repositories. Here are some great examples
- Find a project you could support (long-term goal).
Github desires to be the social communication tool for coders reference. Versioning and sharing code is the core. However, ignoring the other available tools is not wise.
- Github pages
- Project and Organization Wikis (D3 Example)
- Issues
- Discussions
- Projects
- Github Actions (I use the peaceiris action for hugo for our data science programming course at BYUI The R for Data Science book does as well)
You don't need to make these projects complicated. These projects are built to show your work using the skills you have developed during school. I would make sure that these personal projects are presentable. You want to demonstrate your creativity. You could use the following links to find a new data set.
Let's go through the prAcess with this Github repo
- Fork repo on Github
- Clone repo to local computer
- Fix the spelling error above and save the file
- Add a new file called
notes.md - Add or stage your changes
- Commit your work
- Push to github



