# 🤗 Tutorial: Discovering Pretrained Models on Hugging Face

> *A concise guide to navigating and selecting models from the Hugging Face Model Hub*  
Instructor: Yasmine Houri (yasmine.houri@ensae.fr)


## What is Hugging Face?

Hugging Face, Inc. is a French-American company based in New York City that develops computation tools for building applications using machine learning. It is most notable for its transformers library built for natural language processing applications and its platform that allows users to share machine learning models and datasets and showcase their work.

History
The company was founded in 2016 by French entrepreneurs Clément Delangue, Julien Chaumond, and Thomas Wolf in New York City, originally as a company that developed a chatbot app targeted at teenagers.[1] The company was named after the U+1F917 🤗 HUGGING FACE emoji. After open sourcing the model behind the chatbot, the company pivoted to focus on being a platform for machine learning.

(Source: [Wikipedia](https://en.wikipedia.org/wiki/Hugging_Face))

Hugging Face is the reference hub of open-source datasets, tools and pre-trained models. It is host to over 900,000 models, 200,000 datasets, and 300,000 demo applications, all designed to support collaborative and accessible machine learning. Let's learn how to interact with it! Although it is strongly focused on NLP, it also covers audio, video and multimodal tasks.

## Exploring Models on the Hugging Face Hub

Let's visit the online [Hugging Face Hub](https://huggingface.co/models) together!

## Trying Models in Python using ```pipeline```

Access to Hugging Face through Python is via the ```transformers``` library.

> **Transformers** is a library of pretrained natural language processing, computer vision, audio, and multimodal models for inference and training.  
> Transformers provides everything you need for inference or training with state-of-the-art pretrained models.  
> Use Transformers to train models on your data, build inference applications, and generate text with large language models.  
>  
> — _Source: [Hugging Face Transformers Documentation](https://huggingface.co/docs/transformers/index)_


### Pipeline

**Pipeline** is a ready-to-use, integrated API for performing a wide range of machine learning tasks using any model from the Hugging Face Hub.  
Transformers has two pipeline classes:
- a generic [class](https://huggingface.co/docs/transformers/v4.52.1/en/main_classes/pipelines#transformers.Pipeline)
- individual task-specific classes (e.g. [TextGenerationPipeline](https://huggingface.co/docs/transformers/v4.52.1/en/main_classes/pipelines#transformers.TextGenerationPipeline))

In [None]:
# Install packages (uncomment if necessary)
# !pip install transformers

In [None]:
from transformers import pipeline

generator = pipeline(task="text-generation", model="distilgpt2")
# generator = pipeline(task="text-generation",  max_new_tokens=20, model="tiiuae/falcon-rw-1b")
generator("The secret to baking a really good cake is")

Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'The secret to baking a really good cake is the ability to change an egg, and when you bake it, it will just take a while for it to reach equilibrium. I would really love to see more ways to change things, but now I have'}]

You can even prompt the generator with multiple sentences by passing them as a list:

In [None]:
generator(["The secret to baking a really good cake is ", "A baguette is "])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[[{'generated_text': 'The secret to baking a really good cake is iced in vanilla yogurt, iced from almond milk, iced from chocolate milk and a small pinch of lemon juice.\nThis cake is made with iced apple cider vinegar and with the help of'}],
 [{'generated_text': "A baguette is iced cream that will give you a warm and comforting impression of the flavor buds.\n\n\nWhat's this?\nI'm a little confused, but a cream of chocolate with a really good flavor will quickly enhance the"}]]

**Pipeline** can be used for various tasks. Let's try text summarization.

In [None]:
# Text summarization

generator = pipeline(task="summarization", model="facebook/bart-large-cnn")

text = """
From June 23rd to July 3rd, 2025 the Institut Polytechnique de Paris will host the Summer Institute in Computational Social Science. It will take place at ENSAE, 5 Avenue Henri le Chatelier, Palaiseau, France (accepted applicants will receive an email with detailed pratical information about accomodation and how to reach the venue). This has been made possible by the generous support of SICSS, the Templeton Fondation, CREST, and Hi!Paris. The purpose of the Summer Institute is to bring together scholars interested in computational social science. The Summer Institute is for both social scientists (broadly conceived) and data scientists (broadly conceived).

The Summer Institute is open to social scientists, computer scientists, and a few seats could be reserved for people working professionally at this intersection (such as data journalists) if applicable. Please note that although the first 5 days of SICSS-Paris 2025 will be held onsite, the 4 remaining days will be held remotely. This is to facilitate group work, and to foster inclusivity. The institute will involve lectures in the morning, lab sessions in the afternoon, and about 6 evening guest lectures. During the second week, the participants will take part in group work aimed at advancing a research project and attend remote guest lectures as well.

This year’s institute will focus on Large Language Models and Generative Artificial Intelligence. Sessions will take students all the way from an introduction to text analysis through to practical uses of and critical perspectives on deep learning for text analysis in the social sciences. Participants will have ample opportunities to discuss their ideas and research with the organizers, with other participants, as well as with guest speakers. Because we are committed to open and reproducible research, all materials created for the Summer Institute will be released open-source (find materials from the 2023 edition here).

Participation is restricted to advanced Ph.D. students, postdoctoral researchers, and junior faculty (within 7 years of their Ph.D). We welcome applicants from all backgrounds and fields of study, especially junior faculty from neighboring institutions near Palaiseau, France. About 25-30 participants will be invited. Participants are expected to fully attend and participate in the entire 9-day program, which includes 5 days onsite and 4 remote, but we are open to alternative arrangements for faculty members.
"""

summary = generator(text, max_length=150, min_length=40, do_sample=False)
print(summary[0]['summary_text'])


Device set to use cpu


The Summer Institute in Computational Social Science will take place from June 23 to July 3, 2025. It is open to social scientists, computer scientists, and a few seats could be reserved for people working professionally at this intersection. This year’s institute will focus on Large Language Models and Generative Artificial Intelligence. Participants are expected to fully attend and participate in the entire 9-day program.


The ```pipeline``` function has a range of parameters. The user is required to specify at least a ```task identifier```, ```model```, and the appropriate ```input```. To see other parameters, visit this [page](https://huggingface.co/docs/transformers/v4.52.2/en/main_classes/pipelines#transformers.pipeline).

## How to choose a good model?

> Already know which model to use? Great — go ahead and load it!  
> Not sure which model to choose?  👉 Explore the [Hugging Face Hub](https://huggingface.co/) to find the right model for your task.

Consider:

- ✅ Task suitability

- 🧪 Performance (look at evaluation metrics)

- 📜 License (especially for commercial use)

- 🧠 Model size vs. speed tradeoffs


## 🚀 Your Turn!

Choose any model from the 🤗 Hugging Face Hub and use the `pipeline` interface to run a task of your choice.

### 🧠 Suggested Tasks:
- Sentiment analysis
- Text generation
- Named entity recognition
- Text summarization
- Translation

---

### ✅ Instructions:
1. Visit [huggingface.co/models](https://huggingface.co/models) and pick a model.
2. Use the `pipeline` from `transformers` to load and apply the model.
3. Try it out on your own input data and display the results!

> 📝 *Example*: Use a sentiment analysis model to analyze the tone of a paragraph.


# 🎉 You’re now all set to work with Hugging Face models throughout the Summer School. Happy coding!