LLMs from Scratch

This repository is a hands-on exploration of building Large Language Models (LLMs) from scratch using Python and PyTorch. The goal is to understand the fundamentals of language modeling, tokenization, embeddings, and transformer architectures through practical coding exercises.

Repository Structure


llms-from-scratch/
├── data/               
│   ├── the-verdict.txt                 # Raw text data
├── notebooks/                          # Jupyter notebooks with step-by-step experiments
│   ├── 01 - Basic NN - Revision
│       ├──--── 01_neural_network_basics.ipynb
│   ├── 02 - Working with Text Data
│       ├──--── 01_tokenization_learning_module.ipynb
│       ├──--── 02_main_dataloader.ipynb
│   ├── 03 - Attention Mechanisms
│       ├──--── 01_attention_basics.ipynb
├── .gitignore
├── README.md           # This file
├── requirements.txt

Getting Started

Clone the repository:

git clone https://github.com/your-username/llms-from-scratch.git
cd llms-from-scratch

Install dependencies:

pip install -r requirements.txt

Open notebooks in notebooks/ to explore step-by-step.

Contributing

Feel free to fork, experiment, and create pull requests. Open discussions for ideas and improvements are welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLMs from Scratch

Repository Structure

Getting Started

Contributing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
notebooks		notebooks
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

aadharshbalaji/llm-from-scratch

Folders and files

Latest commit

History

Repository files navigation

LLMs from Scratch

Repository Structure

Getting Started

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages