Skip to content

Latest commit

 

History

History
48 lines (30 loc) · 8.01 KB

dand_review.md

File metadata and controls

48 lines (30 loc) · 8.01 KB

I originally posted this review on LinkedIn, this is a copy of my review there.

This April I have completed Udacity’s Data Analyst Nanodegree. It was a long journey that started almost a year ago, when in April, 2018 I applied for Bertelsmann’s Data Science Scholarship. I was first selected for the challenge course that lasted three months and then, in September, 2018, I was awarded the scholarship that covered Data Analyst for Enterprise Nanodegree (DAND for short). Here is my review of the program.

Time Commitment

DAND is huge! I spent almost 7 months working on the program. It took longer than I expected, but I also had to invest enormous amount of time into preparing for applying for Master’s degree. I think I would have completed it faster if I didn't have other commitments. I should also note that I already had theoretical background in both statistics and computer science before starting the program, so I didn’t find the program hard, I didn’t struggle with understanding new concepts, but it was really very time-consuming. I can say that I used DAND as a framework for learning new things. For example, there was a great module on practical aspects of linear regression, but I went a bit further and enhanced the material with the derivation - see here.

If you're taking it all from each module, then it’s hard to do more than two projects per a month, even if you’re studying full time. The program itself consisted of 8 core modules, 8 projects (for each core module) and 3 extracurricular modules (which included huge module on machine learning algorithms in sklearn). I also did one extra project (with Wikipedia links) - just for fun.

Content

The content was a bit different from what I expected. Most of the concepts were already familiar to me in one way or another, but I still learned a lot. To be precise:

  1. I learned to design the analysis process. No more jumping into coding right from the start (shame on me - I did that in the past when time was short). The first few projects were provided with the report template and you couldn’t avoid the thinking and designing stage. It was obligatory. But once I went through it once, I understood its power.
  2. I learned to do data cleaning in an organised way. I had to do a lot of data cleaning before and tried to organise it myself, but the course showed me the proper framework for that. Actually, when I finished this part of the project, my thought was - this is how it should be done. You can see it here. And it did take enormous amount of time, yes.
  3. I learned the principles of visualisation which I wasn't aware of: e.g. ink ratio and lie factor. I felt ashamed of the visualisations I did before. I also realise how misleading a visualisation can be…
  4. I learned the systematic approach to EDA. Before that I used to simply plot some variables that looked interesting when performing EDA. After Project 6 I started to do that in a systematic way: first univariate analysis, then bivariate, and only then - multivariate based on the previous two. Simple but powerful!
  5. I learned about storytelling and the difference between exploratory and explanatory data analysis! And this part was really exciting. For the last project I found an interesting dataset that contained data about the spread of chytridiomycosis in Australia. Using what I learned about proper data wrangling, I prepared and enhanced it with additional data from Wikipedia and then created a story in Tableau.
  6. I learned how to apply statistical concepts with real data (it was a bit different from what I was taught at the university!).

Mentors and Reviews

One of the main reasons why I was interested in a nano degree was that there were mentors and code reviews. I really needed feedback at this point. To be honest, not all the mentors were great, but the mere fact that there are formal requirements and that the work will be assessed by real people made me work harder. Some of the mentors were exceptional (I would like to specifically mention Alan Aboudib), some were just okay. In the beginning there was no way to talk to a mentor directly - they introduced that option later in the program. I think I would have studied differently if I had this option from the very beginning.

I also asked my peers for feedback, and these were as valuable as feedback from mentors. For example, for Tableau project (Project 8) I asked Peter Lipp to review my work and he identified issues that went unnoticed by Udacity mentor. So even though the project has been approved, I stepped back and made changes according to Peter’s comments. And only then I felt that the project was completed.

One important thing to note is that mentors see a lot of projects every day and I guess it is hard to see the same datasets analysed again and again and again. The golden rule for the student here is to choose your own dataset when there is an option to do so. It will be challenging and rewarding for you and you’re sure the mentor will read your work much more thoroughly. I noticed that when I chose my own datasets, the feedback was more emotional and more engaged.

To summarise, when it comes to mentorship, it is a great opportunity and it depends on the student whether she will use it to its full extent.

Summary

Program Pros:

  • You learn a lot of practical aspects that you can’t find in theory books. For example, you learn to face situations when there are no strong correlations and even though it’s tempting to claim that you have found something, it’s important to be able to admit the opposite.
  • By the end of the program, you have a portfolio of 8 completed projects with detailed reports.
  • Experience. After the program I feel quite confident regarding the steps I should take when I’m handed a new dataset.
  • Detailed feedback on the projects. There are two important skills that you learn here: 1) you learn to accept feedback - sometimes you have to submit the project several times, it’s a great practice which I’m sure will be very useful at work, 2) you learn about your strengths and weaknesses.
  • Flexibility in terms of projects complexity - you can choose your own datasets for some of the projects - I think these were the best projects in my case.

Program Cons:

  • Not enough theory! I think, any practice must have sound theoretical basis. I think adding some optional mathematical challenges wouldn’t hurt. As well as adding links to selected quality resources that explain the hard part.
  • Too much focus on EDA. BUT I’m not sure if it’s a real cons, because I think I learnt the most important part of the analysis process. I often read that there is a trend of applying machine learning algorithms without performing full EDA - which
  • No forum. The community we had during the challenge course (before DAND) was something really exceptional - so much energy, so much support for each other. I met many truly amazing people there. Compared to that, Student Hub and even Slack felt quite empty. I think it would have been much better if there was a forum (like the one in the challenge course). Student Hub is nice, but the average response time was about 12 hours, so it’s quite useless if you want to discuss something. There was also Knowledge Base, but I never was able to find any answers there. I actually ended up interacting with the people I met during the challenge course who were also awarded the scholarship. Which was... still great! :)

I think this was a great experience and I don’t regret investing time in it. Thank you, Bertelsmann and Udacity! Thank you, Peter, Isra and Nesreen! It was amazing to study with you!