## GitHub CI-compass Organization
- [GitHub Link: https://github.com/ci-compass/AI-working-scientist/tree/main](https://github.com/ci-compass/AI-working-scientist/tree/main)

# A guide to Generative AI for the Working Scientist

> This video is meant to be a rough guide to some of the concepts and to help understand generative AI and preparation for the NSF CyberInfrastructure Center of Excellence [CI-Compass](https://ci-compass.org/) [Virtual Workshop - AI Meets CI: Intelligent Infrastructure for Major & Midscale Facilities](https://ci-compass.org/news-and-events/events/virtual-workshop-ai-meets-ci-intelligent-infrastructure-for-major-and-midscale-facilities/). The purpose is to start from the beginning and try to de-mystify **chatbot** based Generative AI.

![](https://ci-compass.org/assets/629872/300x/ai_meets_ci_recreation_1.png)

## Chat based generative AI

### Chat interface
- [OpenAI ChatGPT](https://chatgpt.com/)
- [Anthropic Claude](https://claude.ai/new)
- [Google Gemini](https://gemini.google.com/)

### Terminal Based Coding Agents
- [Anthropic Claude Code Cli](https://www.claude.com/product/claude-code)
- [OpenAI Codex Cli](https://github.com/openai/codex)
- [Google Gemini Cli](https://geminicli.com/)
- [Microsoft Visual Studio Code](https://code.visualstudio.com/)

### Notebook Based AI
- [AnswerAI SolveIT](https://solve.it.com)
- [Google Collab](https://developers.googleblog.com/en/data-science-agent-in-colab-with-gemini/)

## Andrej Karpathy "Deep Dive into LLMs like ChatGPT"
![](./images/karpathy-yt.png)

- [Deep Dive into LLMs like ChatGPT: https://youtu.be/7xTGNNLPyMI](https://youtu.be/7xTGNNLPyMI)

## Andrej Karpathy - How I use LLMs"
![](./images/karpathy-yt-use.png)

- [How I use LLMs YT link: https://youtu.be/EWvNQjAaOHw](https://youtu.be/EWvNQjAaOHw)

## "Foundation Models"

- [What is are Foundation Models? https://aws.amazon.com/what-is/foundation-models/](https://aws.amazon.com/what-is/foundation-models/)
- Notion of "Transfer Learning" - That is you can start with a pretrained network for a new application or task.
    - [J. Howard and S. Ruder, ‚ÄúUniversal Language Model Fine-tuning for Text Classification,‚Äù in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018, pp. 328‚Äì339.https://aclanthology.org/P18-1031/](https://aclanthology.org/P18-1031/)

<hr>

- **Multimodal "Text" models:** GPT-5; Anthropic Claude Opus, Sonnet; Google Gemini 2.5; Meta LAMMA, Deepseek R1
- **Image or Video Generation from text:** OpenAI Sora, [Google Imagen](https://imagen.research.google/)
- **Science Foundation Models:** [Futurehouse ether0](https://www.futurehouse.org/research-announcements/ether0-a-scientific-reasoning-model-for-chemistry)
    - [M. Hu et al., ‚ÄúA survey of scientific Large Language Models: From data foundations to agent frontiers,‚Äù arXiv [cs.CL], 28-Aug-2025. https://arxiv.org/abs/2508.21148](https://arxiv.org/abs/2508.21148)
    - [Towards a Physics Foundation Model https://arxiv.org/html/2509.13805v2](https://arxiv.org/html/2509.13805v2)
    - [H. Cui et al., ‚ÄúTowards multimodal foundation models in molecular cell biology,‚Äù Nature, vol. 640, no. 8059, pp. 623‚Äì633, Apr. 2025.https://www.nature.com/articles/s41586-025-08710-y](https://www.nature.com/articles/s41586-025-08710-y)
    - [Nurips 2024: Foundation Models for Science: Progress, Opportunities, and Challenges](https://neurips.cc/virtual/2024/workshop/84714)




## "Autocomplete on Steroids"
- [Vaswami et al., "Attention is all you need.",https://arxiv.org/abs/1706.03762](https://arxiv.org/abs/1706.03762)

<hr>

<img src="https://arxiv.org/html/1706.03762v7/Figures/ModalNet-21.png" width="600" />


### "Groking" -- Memorization vs Generalization
> "Out of Distribution Phenomenon" where  model initially overfits to the training set, and then suddenly generalizes to unseen test samples after continuous training

![An interactive introduction to grokking and mechanistic interpretability](https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/XpCnhaAQrssq8tJBG/rfpm8jhcd5kog1mqi8jn)

- [An interactive introduction to grokking and mechanistic interpretability](https://www.alignmentforum.org/posts/XpCnhaAQrssq8tJBG/an-interactive-introduction-to-grokking-and-mechanistic)
- [Do Machine Learning Models Memorize or Generalize? https://pair.withgoogle.com/explorables/grokking/](https://pair.withgoogle.com/explorables/grokking/)
- [Anthropic Transformer Circuits](https://transformer-circuits.pub/)
- [Y. Tian, ‚ÄúProvable scaling laws of feature emergence from learning dynamics of grokking,‚Äù arXiv [cs.LG], 30-Sept-2025.https://arxiv.org/html/2509.21519v3](https://arxiv.org/html/2509.21519v3)

What makes the transformer architecture more than "Autocomplete on steroids?" What don't we understand about the scale of the architecture.

##### ü§ñReplyü§ñ<!-- SOLVEIT_SEPARATOR_7f3a9b2c -->

Great question! The transformer architecture is indeed often described as "autocomplete on steroids," but there's something deeper happening that we're still trying to understand.

The key mystery is **emergence** - capabilities that appear seemingly out of nowhere as models scale up. Small models do behave mostly like fancy autocomplete, but larger models start exhibiting behaviors they weren't explicitly trained for, like:

- Reasoning through multi-step problems
- Translating between languages they've barely seen
- Understanding analogies and abstract concepts
- Performing arithmetic (even though they're just predicting text!)

This connects to that "grokking" phenomenon you saw in the notes - where a model suddenly generalizes after appearing to just memorize. But at the scale of models like GPT-4 or Claude, we see this happening across many different capabilities simultaneously.

**What we don't understand:**

1. **Why certain capabilities emerge at specific model sizes** - there seem to be "phase transitions" where new abilities suddenly appear
2. **How the models actually represent knowledge internally** - the Anthropic "Transformer Circuits" work is trying to reverse-engineer this
3. **Whether there are fundamental limits** to what emerges with scale, or if we'll keep seeing new capabilities

Have you thought about what kinds of capabilities you'd want to investigate or understand better? Or are you curious about how researchers are trying to "look inside" these models?