# Tips on how to implement a paper in PyTorch: 


1) [PyTorch Paper Replicating from Zero to Mastery Learn PyTorch for Deep Learning](https://www.learnpytorch.io/08_pytorch_paper_replicating/)
    - Note: 
        - Paper: [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (the paper that introduced the ViT archiecture)](https://arxiv.org/pdf/2010.11929.pdf)
        - [YouTube Video](https://www.learnpytorch.io/08_pytorch_paper_replicating/)
        
2) Misclellanous: 
    * [Learn To Reproduce Papers: Beginner's Guide](https://towardsdatascience.com/learn-to-reproduce-papers-beginners-guide-2b4bff8fcca0)
    * [Tips on how my Git repo will look when I start implementing papers](https://github.com/jaygala24/pytorch-implementations)

We're going to focus on recreating the Vision Transformer (ViT) computer vision architecture and applying it to our FoodVisoin Mini problem to classify different images of pizza, steak and sushi. 

The goal of __replicating or implementing__ a machine learning paper is apply the most recent techniques to your own problems. So it is indeed a valuable skill to have. 

In other words, ML paper replicating involves turning an ML paper comprised of images/diagrams, math and text into ussable code in our case, usable PyTorch code.

A machine learning paper is usually divided in the following parts: 

* __Abstract__: An overview/summary of the paper's main findings/contributions.
* __Introduction__: This section answers the following questions: 
    - What's the paper's main problem and details of previous methods used to try and solve it?
* __Method__: This section answers the following questions: 
    - How did the researchers go about conducting their research? 
    - e.g, What model(s), data sources, training setups were used? 
* __Results__: This section answers the following questions: 
    - What are the outcomes of the paper? 
    - e.g, If a new model or training setup was used, how did the results of findings compare to previous works? (this is where _experiment tracking_ comes in handy)
* __Conclusion__: This section answers the following questions: 
    - What are the limitations of the suggested methods? 
    - What are some next steps for the research community?
* __References__: This sections answers the following questions: 
    - What resources/other papers did the researchers look at to build their own body of work? 
* __Appendix__: 
    - Are there any extra resources/findings to look at that weren't included in any of the above sections?


__Machine Learning Engineer(Motto)__: 

1) Download a paper
2) Implement it
3) Keep doing this until you have skills

A __Transformer Architecture__ is generally considered to be any neural network that use the _attention mechanism_ as its primary learning layer. Similar to how a convolutional neural network uses convolutions as its primary learning layer.

Like the name suggests, __the Vision Transformer (ViT) architecture was designed to adapt the original Transformer architecture to vision problem(s)__ (classification being the first and since many others have followed). 

The original Vision Transformer has been through several iterations over the past couple of years, however, we're going to focus on replicating the original, otherwise known as the "vanilla Vision Transformer". Because if you can recreate the original, you can adapt to the others.