Skip to content

EmbraceLife/My_Journey_on_Kaggle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

My Journey on Kaggle

my journey repo

Learn from amazing Kagglers on Kaggle like Radek on twitter, kaggle @cdeotte on kaggle, and a lot more I hope.

2023 Todos

  • implement every notebook introduced by Radek, eg. [[Playground Series Season 3, Episode 1#^6f2913|housing price comp]], [[00fastainbs/my_journey_on_kaggle/Learn from Radek#^52405f|2 things learnt on housing price comp]]
  • follow @radek1 on [[Playground Series Season 3, Episode 1#^6f2913|Playground Series Season 3, Episode 1]],, starting with Season 3, hopefully more
  • make otto comp small and iterate it fast like Playground Series
  • to replicate Radek's journey on RSNA competition
  • to replicate DienhoaT's journey on PogChamp Series competition

Archived for future

Interesting courses and notebooks to check out

  • Distance to Cities features & Clustering🔥 notebook

Highlights of inspirations on my journey

I have given up on learning ML/DL before, and I came back May 2022. Thanks to your book and fastai, I won't give it up anymore. Why? 1) the super amazing fastai alumni share selflessly online, 2) I need no permission to "hold their hands" and learn from them without limit publicly. from this tweet

My story, my path

  • I spent 2 months to do detailed notes of courses, but notes alone does not make me a DL practitioner 😱 😭
  • I spent 2-3 months to write a little tool to help me debug faster and help me get rid of the fear of reading and experimenting fastcore and fastai source codes 🎉 😂
  • however, it still does not help me move toward a DL practitioner 😱 😭
  • Thanks to Radek's book Meta-learning 💕
    • I learn to surrender my own assumptions about learning and start to embrace fastai way of learning - practice and experiment 🔥
    • I get started to learn on Kaggle and use Radek's posts and notebooks as my guide on OTTO recsys competition
  • After spending more than one month in the comp, I am totally stuck on implementing a model pipeline provided by Chris and I think the main cause is I don't know pandas enough. ⛔
  • 😂 💕 I can't say "Thank You" enough to the amazing @wasimlorgat who patiently and thoughtfully helps me understands what I want and how I learn and why I get stuck and how to unstuck, below are just a few he taught me: 😂 💕 🚀
    • to avoid rabbit holes like diving deep into learning pandas or polars 🦮
    • writing down my goals or todos before action to stay focused and to finish fast 🚀
    • not to give in to discouraging thoughts, but to recognize and celebrate every little progress along the way 💗
  • In OTTO comp 🔥 [[OTTO Recsys Comp (New)]]
    • I learnt to read and follow discussions to learn small tricks to improve my public scores
    • I learnt to running and tweaking notebooks by Radek and Chris Deotte, and my previous experience in debugging help me to understand every line of their code without fear 🎉 ⭐
    • I learnt to not give up when stuck but find easier tasks to build up my skills and come back later
  • In Playground Series, Season 3 comp 🔥
    • I , for the first time, experienced and understood what Jeremy and Radek meant by 'iterate fast'
    • I changed my views on toy dataset and comps, and realized they can be powerful tools for learning new and important techniques fast
    • I have finished the goals I set when I joined the comp
  • Walk with fastai course
    • My goal is to dive deep into this course guided by Radek's [[00fastainbs/my_journey_on_kaggle/Learn from Radek#^1a922c|constraint]] principles
  • 2023.1.10 Radek's newsletter helps me to identify and tack my bottleneck 🔥🔥🔥
      • What's the one thing, the bottleneck for me?
        • not build pipelines enough, not iterate pipelines enough
    • How will I tackle this bottleneck?
      • on Kaggle, it's more like learning to build and iterate in the wild, I don't know what I will learn each day, if lucky I can find guides along the way, but in general it's more of keep exploring not knowing what is ahead
      • I learnt how to build and iterate fast from Kaggle comp like [[Playground Series Season 3, Episode 1]], and I am super excited about it because I feel this is what I am missing.
      • on fastai part1, part2, wwf, I know what's ahead is systematic and promising to build me up as a proper practitioner, but the tasks are overwhemingly massive.
      • My plan is to turn course notebooks and kaggle comps into building and iterating pipelines in which I will learn all the techniques of fastai in time.
  • The secret to ML practitioner is Perseverance not intensity 🔥🔥🔥🔥🔥🔥🔥🔥🔥 [[00fastainbs/my_journey_on_kaggle/Learn from Radek#^893aa2|details]]
  • keep constraint on WWF: to dissolve WWF course notebooks into pipeline with different set of components 😂 🔥 🚀 2023.1.10
  • 2023.1.11, I spent a whole day learning lineapy hoping it can speed up iterations. However, so far I have not really got chance to use it in practice. So it is not the right time to do it and it is actually wasted me a day 😢😢😢. (It could be helpful in reusing the code , but I have not in the place to use it yet) [[Learn from Hamel#^5b7b1c|details]]
  • 2023.1.12 I wasted much time dig into Categorize CropPad which are not the central learning point of the lesson 2 of WWF. I should focus on the most important thing which is the pipeline and its components, and the detailed usage of those techniques can be learnt later when they are truly needed in practice. 😱😱😱
  • Radek's AMA is amazing, many great insights, in particular ideas on how to subset properly, fast iteration, training on full dataset are what I needed the most at the moment. [[Learn from Radek#^73faa3|details]] 💡⚡🔥
  • After 3 days on a course, I realized that the strategy of learning which works for me is through a ML project like kaggle comp , a course without focusing on a real-world project or competition can't keep my attention for long 😅 🔍🔍🔍 (why? from a part of the conversation I had with Wasim)

    Why is that? first of all, everything I learnt is what I earned by implementing and verifying the code myself, so every bit of the learning is a reward; secondly, every bit of the learning is applied in a real dataset with a real world problem, it ensures me the technique I learnt is useful in real world; thirdly, by focusing on a project/comp, a pipeline can be finished with a handful of techniques learnt, there is a nice feeling of completion (of course, there will more a lot more iterations), but with a course it could take months to absorb everything inside and still not feeling capable of doing things in the real world;

  • 2023.1.14 I watched [[00fastainbs/my_journey_on_kaggle/Learn from Radek#^c32318|Radek's intro to otto comp]], it's amazing how many [[OTTO Recsys Comp (New)#^4a8749|new insights and ideas (todos)]] can come out from the notebooks (which I thought I knew) when listening to a great guide (Radek) talking about them 🔥🔥🔥💡💡💡
  • 📙📙📙 I am learning and sharing everything about ML/DL openly online, holding nothing back. This makes everyone my friends, and reveal my only true deadly enemy, which is Mr/Ms. giving-it-up. (Wow, a quiet number of likes and follows due to this tweet)
  • 🎉🎉🎉 The first discussion and a notebook on Kaggle which I am proudly to present, see this tweet
  • 🔥🔥🔥⚡⚡⚡2023.1.15 I have not finished his video but can't wait to write down Chris Deotte's secret to grandmasters - true love to do EDA on datasets (where many inspirations come from) + good validation + fast iterations
  • 🎉🎉🎉 What's a happy day, I became Kaggle Notebook Expert today!
  • 2023.1.16 I have sort of finished today's task, but I have not recorded/updated my learnings when doing rewriting Radek's Covisitation matrix notebook in polars because I was too consumed by the problems I was dealing. I don't think it is good and I need to find a way to record my learning when I am dealing with problems. 😱😱😱😱😱
  • 2023.1.17 I should have write down my goals specifically for each project (like rewriting this notebook) as Wasim adviced (just keep the goals in mind won't work in practice, it has to be in the notebook where I am working at) and remind myself those exactly goals whenever I hit a wall with a detailed problems, otherwise I will spent too much time working on less important smaller tasks without making progress on the real important ones!!! 😭😭😭⚡⚡⚡
  • 2023.1.18 morning, found another super amazing kaggler who is only joined Kaggle for 2 weeks and have done one competition but won 3 Expert titles, but most important the way this person shares is also very beneficial to beginners. kaggle post and his Tatanic github 🔥🔥🔥🔥🔥🔥🔥, how he tackle ML/DL notebook 🔥💥🔥💥🔥💥
  • 2023.1.18 I have found this post 🔥🔥🔥🔥 on otto comp, it is both amazing (summarize many important posts) and overwhelming (if to implement them all). However, I should stop myself to be panicking or overwhelmed, just keep moving as fast I can and making progress everyday, that's enough for me. 😂😂😂
  • 2023.1.19 Sometimes I fell into the trap of worrying whether what I learnt in doing kaggle is too unsystematic and won't help me in the long run. Solution: Perseverance will remove all the unnecessary and unfounded worries I have or will have in the long run. ⚡⚡⚡💡💡💡
  • I love the idea of [[00fastainbs/my_journey_on_kaggle/Learn from Radek#^c04b53|mastermind group]] introduced by Radek. One appealing benefit of having such a group is reflection for the group can be a strong motivation me to do the reflection more seriously . But it seems to make such a group a prerequisite for the reflection work, but as Radek said you can't force people to join your mastermind group. So, how to have my mastermind group right now without forcing it in order to do my reflection well? The solution is to tell myself that my mastermind group is out there online checking my work in twitter, kaggle and fastai forum with or without informing me , and all I need to do is to work and share diligently with them as audience in my mind. 💡💡🎉🎉
  • I have been worrying about how to share my learning nicely on twitter and kaggle? I organize my work mainly in Obsidian note, and then push it to my repo, but sharing notes on twitter and kaggle forum is not straightforward and can take more time which I feel unnecessary. Now, using Radek's strategy of constraints and how he applied to building youtube channel, a simple solution should suffice: growing my reflections by adding tweets or updating kaggle posts in terms of images and links to relevant places on my repo .
  • 2023.1.19: playground video done , put together evaluation script in one notebook for otto done , solved the problems of sharing my reflections of what I learnt done
  • 2023.1.20 : set learning goals in Deotte's co-visitation candidate rerank model #todo
    • run the new playground series and check out the LightGBM on categorical columns in both polars and pandas with detailed comment
    • do reflection for playground series
    • learn to use cudf as I learnt to use polars, so that I can train with GPU (much faster than polars with CPU based on my experience so far)
    • gain better understand of using co-visitations for generating candidates, create hand-crafted rules for reranking (potentially used as features)
    • how to split large dataset to train models without blowing out GPU and CPU RAM
  • 2023.1.21: inspiration from Radek's newsletter from last week
    • Many activities are worth doing, worth participating in, even if you never stand a chance to be the best in the world at them

  • 2023.1.22: Trying to build a template for tabluar classification and regression competition, see twitter thread here
  • 2023.1.23: Learn from how to do hyperparam search from a notebook shared. I started to do some research on github copilot
  • 2023.1.24: I have paused my kaggle work since yesterday, mainly due to a 2-day project, which is whether and how to get me started on working with github copilot efficiently [[Learn about chatGPT, codex, vscode]].
    • I have done some research but not feeling the current copilot to be super useful to my needs at the moment. But vscode is useful and I still get copilot installed to try it out.
    • However, in the process I found out how powerful chatGPT and codex models are, and I suddenly have a strong urge to do something useful with them for myself.
    • Although I integrated a new notebook which taught me how to do random search for hyperparams 🎉🔥 into my template last night (but I didn't rewrite them 😔), maybe partly because of this urge of trying out gpt-3, I am feeling losing the momentum and steam in doing kaggle today 😱😱😱. I fear it may sink my commitment to learn on kaggleIs. I am struggling and looking for compromise between keeping my momentum on kaggle and trying out gpt3.
  • 2023.1.26 redefine my learning goals
    • learn to use chatGPT pro or openai playground to be my coding tutor
      • how chatGPT pro differ? sounds good by answers from chatGPT
    • try to use chatGPT to explain codes of merlin.model get started notebook, it works great excepting suggesting codes [[How to make GPT-3 my coding tutor]]
      • nor does copilot can give me correct code neither, maybe because I am too unfamiliar with the the library from which I ask codes
  • 2023.1.27 tasks today
    • use chatbot to learn LangChain library
    • reading its docs to verify the answers from LangChain library
    • try to understand how YoLoPandas was built and how it work
  • 2023.2.3 Self encouragement: don't be afraid of slowness at first, because as you compound on your learning, you will be fast
  • 2023.2.6 I have spent last 7 days working on chat_LangChain and accomplished the following
    • understanding what does each line do
    • shared how to run the app on huggingface and local machine
    • shared my understand of what each part of the codebase do and they interact with each other
    • experimented on the prompt template to see the magic of prompt engineering
    • apply the chatbot to a different external data source: a webpage book
  • today's todos
    • use chat_LangChain to test whether openai set limit on request/min actually affect on weaviate getting all the documents needed
      • to achieve above, I will try to give it a new weaviate instance done
      • I can't verify the openai query/min limit on huggingface with the hwchase17's demo. So, this is done I think
    • paper-qa codebase exploration
    • prompt exploration and collection
    • I put last week's work on chat_LangChain all in this interconnected tweet
  • 2023.2.25
    • I have not done journal for a while. I think the inner fear and laziness had won over the last two weeks. I was felt defeated in reading the dsp paper because of slow going (even though I learnt quite a lot and it is a very well written and beginner friendly paper). I also felt defeated because the source code of dsp was quite hard to understand, and the progress was slow. All these defeated feelings prevented me from puting them down into words, not mention to let others know through twitter.
    • However, this kind of defeat will be normal if I am keep learning, so I must get used to it. In fact, the only way to get over those defeated feeling is to hang in there and no matter how slow I move just think about or reflect the problems everyday. As days go by, the average human mind will connect the dots for me, and then I will see apparent progress no matter how small they may seem later on (which itself is again a defeated feeling in the future)
    • In fact, why exist such defeated feeling, isn't it normal...
    • now, I still have not finished reading and experimenting the dsp source code, but I can see my progress which seems so hard 2 days ago. I have figured out how to experiment the source code without messing up the cached examples, and I have experimented enough to confirm the usefulness of github copilot in helping me learning and commenting on source codes
  • 2023.2.26
    • any more best practices (like visit up to 20 files of relevant source code) to use github copolit, see tweet
    • the experience of using copilot to read and comment source code is getting better and better
    • install codeium as a free alternative to github copilot (but only use it when copilot is not available)

Read more of the ups and downs of my journey, see here

About

Learn from amazing Kagglers on Kaggle

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages