#Import et test GPT2 Groupe Dracolia

Membres:
- Bastien HOTTELET
- Pascal ZHAN
- Fatih FIDAN
- Lilian SOARES
- Tamij SARAVANAN
- Kévin Postic
- Evan PARIS

Date : 17/11/2023

Description : Importation du modèle pré-entrainé GPT-2 de Keras NLP et réalisation de test sur le modèle afin de comprendre le fonction de GPT-2




## Installation Keras NLP, choix de backend de Keras et importation des librairies

Avant de procéder à l'importation du modèle, il est nécessaire d'installer Keras NLP

In [1]:
!pip install git+https://github.com/keras-team/keras-nlp.git -q

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m950.8/950.8 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.2/5.2 MB[0m [31m22.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m475.2/475.2 MB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.5/5.5 MB[0m [31m112.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m442.0/442.0 kB[0m [31m31.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m93.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for keras-nlp (pyproject.toml) ... [?25l[?25hdone


Après avoir réalisé quelques tests, nous avons remarqué que la version **2.15.0** de tensorflow n'arrive pas à détecter le GPU, nous avons décidé de désinstaller cette version et installer la version **2.14.0** puisque celle-ci fonctionne correctement.

Par ailleurs, il est nécessaire de redémarrer la session colab pour appliquer les modifications

In [3]:
import tensorflow as tf
print(tf.__version__)
if tf.__version__ == "2.15.0":
  !pip uninstall tensorflow
  !pip install tensorflow==2.14.0

2.15.0
Found existing installation: tensorflow 2.15.0
Uninstalling tensorflow-2.15.0:
  Would remove:
    /usr/local/bin/estimator_ckpt_converter
    /usr/local/bin/import_pb_to_tensorboard
    /usr/local/bin/saved_model_cli
    /usr/local/bin/tensorboard
    /usr/local/bin/tf_upgrade_v2
    /usr/local/bin/tflite_convert
    /usr/local/bin/toco
    /usr/local/bin/toco_from_protos
    /usr/local/lib/python3.10/dist-packages/tensorflow-2.15.0.dist-info/*
    /usr/local/lib/python3.10/dist-packages/tensorflow/*
Proceed (Y/n)? Y
  Successfully uninstalled tensorflow-2.15.0
Collecting tensorflow==2.14.0
  Downloading tensorflow-2.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (489.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m489.8/489.8 MB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
Collecting tensorboard<2.15,>=2.14 (from tensorflow==2.14.0)
  Downloading tensorboard-2.14.1-py3-none-any.whl (5.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Dans le cadre de la SAE, nous utilisons le **Keras Core** en le spécificant le lancement sur tensorflow. En effet, il est possible de lancer le **Keras Core** avec d'autres workflows que celui de tensorflow, comme torch et jax

In [2]:
#import package
import os

os.environ["KERAS_BACKEND"] = "tensorflow"

import keras_nlp
import tensorflow as tf
import keras_core as keras
import time

##Introduction à KerasNLP
La construction de grands modèles de langage est complexe et coûteuse à partir de zéro. Heureusement, il existe des modèles de langage pré-entraînés disponibles pour une utilisation immédiate. KerasNLP propose un grand nombre de points de contrôle pré-entraînés qui vous permettent d'expérimenter avec des modèles de pointe sans avoir à les entraîner vous-même.

KerasNLP est une bibliothèque de traitement du langage naturel qui prend en charge les utilisateurs tout au long de leur cycle de développement. KerasNLP propose à la fois des modèles pré-entraînés et des blocs de construction modulaires, ce qui permet aux développeurs de réutiliser facilement des modèles pré-entraînés ou de créer leurs propres modèles de langage.

En résumé, pour les modèles génératifs de langage, KerasNLP propose :

Des modèles pré-entraînés avec la méthode generate(), par exemple, keras_nlp.models.GPT2CausalLM et keras_nlp.models.OPTCausalLM.
Une classe Sampler qui met en œuvre des algorithmes de génération tels que Top-K, Beam et la recherche contrastive. Ces échantillonneurs peuvent être utilisés pour générer du texte avec des modèles personnalisés.

##GPT2-base

### Import GPT2-base

In [4]:
preprocessor = keras_nlp.models.GPT2CausalLMPreprocessor.from_preset(
    "gpt2_base_en",
)
gpt2_base = keras_nlp.models.GPT2CausalLM.from_preset(
    "gpt2_base_en", preprocessor=preprocessor
)

In [10]:
gpt2_base.summary() #124,439,808 de paramètres

### TEST GTP-2 base

Une fois que nous avons pu charger le model pré-entrainé, nous pouvons réalisé des tests en lui demandant de générer la suite d'un texte.

Par ailleurs le modèle est basé en anglais, il faut donc réaliser nos tests en anglais.

In [9]:
start = time.time()

output = gpt2_base.generate("French people are", max_length=200)
print("\nGPT-2 output:")
print(output)

end = time.time()
print(f"TOTAL TIME ELAPSED: {end - start:.2f}s")


GPT-2 output:
French people are not allowed to vote in France, but they do vote, according to the government of President François Hollande.

France is not a member of the European Economic Community (EEA), but its membership was established by the French in 1975.

France has a history of voting in the European Union (EU), the European Parliament (EPC), the European Commission (EC), and the European Parliament of Germany (EPFL) as well as the European Central Bank. However, the country's current membership is based on its relationship with the E.U. and the European Economic Community (EEA), which have been in conflict for decades. The European Commission, which is responsible for the E.U., is currently under a new government in Brussels.

In an interview, Hollande said the country's participation in the E.E.C. was "a positive development" in its relations with the E.U.

"We have been able to develop relations
TOTAL TIME ELAPSED: 14.64s


GPT-2 output:

French people are not allowed to vote in France, but they do vote, according to the government of President François Hollande.

France is not a member of the European Economic Community (EEA), but its membership was established by the French in 1975.

France has a history of voting in the European Union (EU), the European Parliament (EPC), the European Commission (EC), and the European Parliament of Germany (EPFL) as well as the European Central Bank. However, the country's current membership is based on its relationship with the E.U. and the European Economic Community (EEA), which have been in conflict for decades. The European Commission, which is responsible for the E.U., is currently under a new government in Brussels.

In an interview, Hollande said the country's participation in the E.E.C. was "a positive development" in its relations with the E.U.

"We have been able to develop relations

TOTAL TIME ELAPSED: 14.64s

In [8]:
start = time.time()

output = gpt2_base.generate("French people are", max_length=200)
print("\nGPT-2 output:")
print(output)

end = time.time()
print(f"TOTAL TIME ELAPSED: {end - start:.2f}s")


GPT-2 output:
French people are increasingly concerned over the safety of their food, and are increasingly worried about their health.

A recent study by the World Bank and other organizations has found that the risk of heart and respiratory problems in France is increasing, particularly in areas with high levels of obesity.

According to the report, France is the second most obese nation in the world, behind only the United States.

"The number of people in France with heart problems and the incidence of type 2 diabetes have increased by nearly 50 percent since 2000, while the rate of obesity among French women has increased by almost 20 percent," said the report.

The report was published in the European journal Health Policy. The report found that "the number of people in France who are overweight or obese has increased by about 10 percent since 2000."

The report also said that "the health care system in France is becoming increasingly expensive."

The French government has alread

GPT-2 output:

French people are increasingly concerned over the safety of their food, and are increasingly worried about their health.

A recent study by the World Bank and other organizations has found that the risk of heart and respiratory problems in France is increasing, particularly in areas with high levels of obesity.

According to the report, France is the second most obese nation in the world, behind only the United States.

"The number of people in France with heart problems and the incidence of type 2 diabetes have increased by nearly 50 percent since 2000, while the rate of obesity among French women has increased by almost 20 percent," said the report.

The report was published in the European journal Health Policy. The report found that "the number of people in France who are overweight or obese has increased by about 10 percent since 2000."

The report also said that "the health care system in France is becoming increasingly expensive."

The French government has already started a program to provide free food and

TOTAL TIME ELAPSED: 14.36s

###Autres résultats obtenus

GPT-2 output:

French people are  tired of their politicians, and they're not going to take it anymore.
The French have a long history of voting for the lesser of two evils in order to avoid the worse of the two.  This is why the French were able to get rid of the monarchy, and why they were able to get rid of the aristocracy.  They were able to do so because they were willing to sacrifice their liberties in the process, and they were willing to sacrifice their freedoms in order to avoid the worst of the two.  They are not going to do that anymore, and they are not going to do it by voting for a lesser of two evils.    They will do it by voting for the greatest of two evils.  They're going to vote for the lesser of two evils.
And this is where the French people are going to vote in this election:
They are going to vote for Marine Le Pen.

TOTAL TIME ELAPSED: 52.85s

Output:

French people are always trying to get me to do the "right" thing when it comes to the toilet seat.  

so i was sitting on my toilet seat with my back against the wall, and my legs were on the edge of the seat and my backside was on top of it.                                                                                             

Total Time Elapsed: 4.17s

<hr>

Output:

French people are so fucking lazy. fuck.
                    
Total Time Elapsed: 3.36s

GPT-2 output:

What is the capital city of country France ?

The capital of France is located in the French province of Brittany and is situated in the southern part of Europe. It is a small country in that it is not as big as most other countries in Europe, but the country of its capital is not as large as other countries. The capital city is located in the city of Marseilles, which is in the city of Marseille. It is the largest city in France, with a population of over 2 million people. It is a small country and its population is small. It has a high quality of life and has a great number of restaurants. It has a good number of hospitals and is the only city in the world with a population of more than 2 million.

What is the city of Marseille?

The city of Marseille is located in the city of Marseille, which is located in the city of Marseille in the south of France, in the city of

TOTAL TIME ELAPSED: 15.68s

##GTP2-large

Suite aux tests avec gpt2_base, nous apercevons que celui-ci n'est pas assez puissant et donne des résultats un peu douteux.

Nous avons passez des tests sur gpt2 qui possède beaucoup plus de paramètres.

### Import GPT2-large

In [11]:
preprocessor_large = keras_nlp.models.GPT2CausalLMPreprocessor.from_preset(
    "gpt2_large_en",
)
gpt2_large = keras_nlp.models.GPT2CausalLM.from_preset(
    "gpt2_large_en", preprocessor=preprocessor_large
)

Downloading data from https://storage.googleapis.com/keras-nlp/models/gpt2_large_en/v1/vocab.json
[1m1042301/1042301[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step       
Downloading data from https://storage.googleapis.com/keras-nlp/models/gpt2_large_en/v1/merges.txt
[1m456318/456318[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step       
Downloading data from https://storage.googleapis.com/keras-nlp/models/gpt2_large_en/v1/model.h5
[1m3096768960/3096768960[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 0us/step


In [12]:
gpt2_large.summary() #774,030,080 de paramètres

### TEST GPT2-large

In [13]:
start = time.time()

output = gpt2_large.generate("French people are", max_length=200)
print("\nGPT-2 output:")
print(output)

end = time.time()
print(f"TOTAL TIME ELAPSED: {end - start:.2f}s")


GPT-2 output:
French people are not happy about the government's decision to allow gay marriage and are demanding the government take action on the issue.

A recent survey of 1,000 French people conducted by the French polling company Ifop found that a whopping 70.7 percent of those surveyed were opposed to same-sex marriage.

This means the country is now the most homophobic nation on the planet.

"The survey shows that French people are not happy about the government's decision to allow gay marriage, and that they want the government to intervene," the BBC quoted French President Francois Hollande as saying. "I think that the people of France want the government to take a strong position on this issue."

Hollande has been under fire for his decision to allow same-sex marriage. His popularity has plummeted since he made his announcement.

The BBC reports that Hollande's approval rating has fallen to just 34 percent, and that he is the most unpopular French president in modern history

GPT-2 output:

French people are not happy about the government's decision to allow gay marriage and are demanding the government take action on the issue.

A recent survey of 1,000 French people conducted by the French polling company Ifop found that a whopping 70.7 percent of those surveyed were opposed to same-sex marriage.

This means the country is now the most homophobic nation on the planet.

"The survey shows that French people are not happy about the government's decision to allow gay marriage, and that they want the government to intervene," the BBC quoted French President Francois Hollande as saying. "I think that the people of France want the government to take a strong position on this issue."

Hollande has been under fire for his decision to allow same-sex marriage. His popularity has plummeted since he made his announcement.

The BBC reports that Hollande's approval rating has fallen to just 34 percent, and that he is the most unpopular French president in modern history.

TOTAL TIME ELAPSED: 47.22s

In [14]:
start = time.time()

output = gpt2_large.generate("French people are", max_length=200)
print("\nGPT-2 output:")
print(output)

end = time.time()
print(f"TOTAL TIME ELAPSED: {end - start:.2f}s")


GPT-2 output:
French people are the world's most generous people, according to a study that reveals how the country has the highest per capita generosity in the world.

In the study by the charity Oxfam, France was ranked the world's "best country" for generosity, ahead of Norway, Switzerland, Finland and the United States.

Oxfam found that the average French person donated $1,500 in 2013. This was the second highest per person in the world. The UK was second, with a donation average of $1,200.

The charity also found that the average French family gave $1,800 in donations.

France has been ranked number one on the Oxfam index since 2008, with the UK in second place, and the US in third.

The report also found that France has the highest per capita generosity of any country. In 2013, per person, the UK was the third most generous country in the world, after Switzerland and Denmark.
TOTAL TIME ELAPSED: 44.89s


GPT-2 output:

French people are the world's most generous people, according to a study that reveals how the country has the highest per capita generosity in the world.

In the study by the charity Oxfam, France was ranked the world's "best country" for generosity, ahead of Norway, Switzerland, Finland and the United States.

Oxfam found that the average French person donated \$1,500 in 2013. This was the second highest per person in the world. The UK was second, with a donation average of $1,200.

The charity also found that the average French family gave $1,800 in donations.

France has been ranked number one on the Oxfam index since 2008, with the UK in second place, and the US in third.

The report also found that France has the highest per capita generosity of any country. In 2013, per person, the UK was the third most generous country in the world, after Switzerland and Denmark.

TOTAL TIME ELAPSED: 44.89s

### Autres résultats obtenus

GPT-2 output:

French people are still waiting for their government to take action on climate change, a survey by polling agency Ifop found, as the country struggles with its worst floods and wildfires in decades.

A total of 64 percent of respondents said they were concerned about the effects climate change could have on their lives, with just under half saying they felt it could have an impact on the country's economy.

The poll also found that a quarter of French people said they believed climate change was a "very important issue," compared to just 10 percent who said the same in the United Kingdom.

French citizens are also concerned about the country's economy, with a third of respondents saying it would be "very bad" for the economy if the government did not take action on climate change.

Ifop surveyed 2,500 people in France, the United Kingdom, Germany, and Italy in the run up to Christmas.

The poll comes after a series of deadly floods in the country,

TOTAL TIME ELAPSED: 46.62s

GPT-2 output:

Turkish people are being forced to pay for a government-funded project to build a bridge over the Bosphorus Strait, a Turkish newspaper reported Tuesday.


The bridge would be a key part of a plan to build a bridge over the strait that would cut off a major shipping route for Turkey from the European Union.


The bridge, which is being financed with a 1.5 billion Turkish Liras ($9.5 million) loan from the Turkish government, is being built by the Turkish company Aselsan.


But the project has been delayed, and it is unclear when construction will begin.


The project has been delayed for years because of a dispute with the Turkish government, which is concerned about the cost and the environmental impact.


Aselsan, a major construction and engineering company, is a member of the consortium that has built the bridge over the Bosphorus.


The project was first announced in 2009 and the project was initially expected

TOTAL TIME ELAPSED: 44.53s

Nous remarquons que GPT2-large est plus performant et les résultats obtenus semblent plus correcte que celui de gpt2-base, par contre, il peut partir très facilement en hors sujet.