# Notebook de Pipeline

Esse notebook serve para simplificar o processo de montagem e exportação dos pipelines criados nas diversas abordagens.

# Configuração

In [8]:
%pip install pandas
%pip install spacy
%pip install scikit-learn
import spacy    
spacy.cli.download('en_core_web_lg')


Note: you may need to restart the kernel to use updated packages.


[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_lg')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


# Pipeline

## Importação da funções do pipeline

Para ser possível criar e exportar o pipeline, será copiado as funções declaradas no arquivo [pre_processing.Ipynb](./pre_processing.Ipynb)

## Criação do pipeline

In [9]:
import sys
sys.path.append('./')
from pipeline import to_lower, extract_links, tokenize_and_pre_processing

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer

extract_links_transformer = FunctionTransformer(extract_links)
tokenize_and_pre_processing_transformer = FunctionTransformer(tokenize_and_pre_processing)
to_lower_transformer = FunctionTransformer(to_lower)

pipeline = Pipeline([
    ('to_lower', FunctionTransformer(to_lower)),
    ('remove_links', FunctionTransformer(extract_links)),
    ('tokenize', FunctionTransformer(tokenize_and_pre_processing))
])

## Teste do pipeline

In [10]:
phrases = [
"Spent 20 minutes in an Uber listening to what I can best describes as ?Eagles B-sides, but about Jesus?",
"via The Guardian  Guardian front page, Monday 11 July 2022 - The #Uber files: Leak reveals secret lobbying operation to conquer the world  https://t.co/hjsUSc6AVZ",
"i had a bad drive . i want my refund",
]

for phrase in phrases:
    transformed_text = pipeline.transform(phrase)
    print(transformed_text)
    

[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_lg')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.
['spend', '20', 'minute', 'uber', 'listening', 'well', 'describe', 'eagle', 'side', 'jesus']
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_lg')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.
['guardian', 'guardian', 'front', 'page', 'monday', '11', 'july', '2022', 'uber', 'file', 'leak', 'reveal', 'secret', 'lobbying', 'operation', 'conquer', 'world']
[38;5;2m✔ Download 

## Exportação do Pipeline

In [11]:
import joblib
joblib.dump(pipeline, '../data/pipeline.joblib')

['./data/pipeline.joblib']