# Gemini Analysis for controls in the agricultural sector

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://ai.google.dev/tutorials/quickstart_colab"><img src="https://ai.google.dev/static/site-assets/images/docs/notebook-site-button.png" height="32" width="32" />View on Google AI</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google/generative-ai-docs/blob/main/site/en/tutorials/quickstart_colab.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/google/generative-ai-docs/blob/main/site/en/tutorials/quickstart_colab.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>

## Prerequisites

You can run this tutorial in Google Colab, which doesn't require additional environment configuration.

Alternatively, to complete this quickstart locally, see the Python guidance in [Get started with the Gemini API](https://ai.google.dev/tutorials/quickstart).

## Install the SDK

The Python SDK for the Gemini API is contained in the [`google-generativeai`](https://pypi.org/project/google-generativeai/) package. Install the dependency using pip:

In [None]:
!pip install -q -U google-generativeai

## Set up your API key

To use the Gemini API, you'll need an API key. If you don't already have one, create a key in Google AI Studio.

<a class="button" href="https://aistudio.google.com/app/apikey" target="_blank" rel="noopener noreferrer">Get an API key</a>

In Colab, add the key to the secrets manager under the "🔑" in the left panel. Give it the name `GOOGLE_API_KEY`. Then pass the key to the SDK:

In [None]:
# Import the Python SDK
import google.generativeai as genai
# Used to securely store your API key
from google.colab import userdata

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

## Initialize the Generative Model

Before you can make any API calls, you need to initialize the Generative Model.

In [None]:
model = genai.GenerativeModel('gemini-pro')

## Small sample on how to use Gemini

In [None]:
response = model.generate_content("Quel est le nom de la capitale de la suisse ?")
print(response.text)

Berne


## What's next

To learn more about working with the Gemini API, see the [Python tutorial](https://ai.google.dev/tutorials/python_quickstart).

If you're new to generative AI models, you might want to look at the
[concepts guide](https://ai.google.dev/docs/concepts) and the
[Gemini API overview](https://ai.google.dev/docs/gemini_api_overview).

# Export all the control points for a collection of pdfs

In [1]:
!pip install google-cloud google-auth google-auth-oauthlib google-auth-httplib2 PyPDF2 pandas openpyxl

Collecting google-cloud
  Downloading google_cloud-0.34.0-py2.py3-none-any.whl.metadata (2.7 kB)
Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Downloading google_cloud-0.34.0-py2.py3-none-any.whl (1.8 kB)
Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: google-cloud, PyPDF2
Successfully installed PyPDF2-3.0.1 google-cloud-0.34.0


In [2]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [57]:
import os
import PyPDF2
import pandas as pd
import time
from openpyxl import load_workbook
from google.colab import files

In [3]:
# Import the Python SDK
import google.generativeai as genai
# Used to securely store your API key
from google.colab import userdata

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

In [9]:
def extract_text_from_pdf(pdf_path):
    with open(pdf_path, 'rb') as pdf_file:
        pdf_reader = PyPDF2.PdfReader(pdf_file)
        total_pages = len(pdf_reader.pages)
        text = ''
        for page_num in range(total_pages):
            page = pdf_reader.pages[page_num]
            text += page.extract_text()
        return text

In [63]:
def summarize_text(text, max_tokens=30000):
    truncated_text = text[:max_tokens]
    model = genai.GenerativeModel('gemini-pro')
    response = model.generate_content("The following content contains control points for the agricultural sector. Can you list as bullet points, without categories and groups, these control points and write them in french ? Every bullet point has to be on line without carriage return" + truncated_text)
    summary = response.text
    return summary

In [55]:
def write_to_excel(data, filename):
    df = pd.DataFrame(data)
    # Vérifiez si le fichier Excel existe
    if not os.path.exists(filename):
       # Créez un nouveau fichier Excel avec une feuille de base
       with pd.ExcelWriter(filename, engine='openpyxl') as writer:
          df.to_excel(writer, sheet_name='ControlPoints', index=False)
    else:
       # Chargez le fichier Excel existant
        book = load_workbook(filename)
        if 'ControlPoints' in book.sheetnames:
            sheet = book['ControlPoints']
        else:
            # Si la feuille n'existe pas, créez une nouvelle feuille
            sheet = book.create_sheet('ControlPoints')

        # Trouvez la première ligne vide dans la feuille
        max_row = sheet.max_row
        for i, row in df.iterrows():
            for j, value in enumerate(row):
                sheet.cell(row=max_row + 1 + i, column=j + 1, value=value)

        # Enregistrez les modifications dans le fichier Excel
        book.save(filename)

In [65]:
# Répertoire contenant les fichiers PDF
# pdf_directory = '/content/drive/MyDrive/OFAG/Controls/FR_ControlPDF'
pdf_directory = '/content/drive/MyDrive/OFAG/Controls/DE_ControlPDF'


In [66]:
# Liste pour stocker les résultats
file_counter = 0
fichier_excel = 'DE_ControlPDF_Summary.xlsx'
#fichier_excel = 'FR_ControlPDF_Summary.xlsx'

# Parcourir les fichiers PDF et extraire le texte et les résumés
for filename in os.listdir(pdf_directory):
    if filename.endswith('.pdf'):
        file_counter += 1
        print (file_counter)
        if file_counter > 0:
          pdf_path = os.path.join(pdf_directory, filename)
          text = extract_text_from_pdf(pdf_path)
          #time.sleep(60)
          summary = summarize_text(text)
          lignes_contenu = summary.splitlines()
          # Itère sur les lignes et affiche chaque ligne
          data = []
          for ligne in lignes_contenu:
            if len(ligne) > 0:
              data.append({'File': filename, 'Control points': ligne})
          write_to_excel(data, fichier_excel)
          #break

# Télécharger le fichier Excel sur Google Drive
files.download(fichier_excel)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>