In [None]:
!pip install transformers



In [None]:
!pip install TextBlob



In [None]:
!pip install nltk



In [None]:
!python -m textblob.download_corpora

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package conll2000 to /root/nltk_data...
[nltk_data]   Unzipping corpora/conll2000.zip.
[nltk_data] Downloading package movie_reviews to /root/nltk_data...
[nltk_data]   Unzipping corpora/movie_reviews.zip.
Finished.


## 1. <a name="1">Working with simple text-cleaning processes</a>


In this section, you will do some general-purpose text cleaning. The following methods for cleaning can be extended, depending on the application.

In [None]:
### TextBlob is an object-oriented NLP text-processing library that is built on the NLTK and
####pattern NLP libraries and simplifies many of their capabilities

from textblob import TextBlob
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [None]:
##### the text is a portion of Financial Stability and Economic Developments by Chair Jerome H. Powell

text = ' '.join(('The economy is also facing headwinds from tighter credit conditions for households and businesses',
'which are likely to weigh on economic activity, hiring, and inflation.',
'Tighter credit conditions are a natural result of tighter monetary policy. But the bank stresses that emerged in March may well lead to a further tightening in credit conditions.',
'The extent of these effects remains uncertain.',
'When bank stress emerged in March, we acted in concert with other government agencies to address it, enabling the Federal Deposit Insurance Corporation to resolve two failed banks in a manner that protected all depositors.',
'We also used our liquidity tools to make funding available to banks that might need it.'))

blob = TextBlob(text)
print(blob)


The economy is also facing headwinds from tighter credit conditions for households and businesses which are likely to weigh on economic activity, hiring, and inflation. Tighter credit conditions are a natural result of tighter monetary policy. But the bank stresses that emerged in March may well lead to a further tightening in credit conditions. The extent of these effects remains uncertain. When bank stress emerged in March, we acted in concert with other government agencies to address it, enabling the Federal Deposit Insurance Corporation to resolve two failed banks in a manner that protected all depositors. We also used our liquidity tools to make funding available to banks that might need it.


In [None]:
#### Tokenizing Text into Sentences and Words
blob.sentences

[Sentence("The economy is also facing headwinds from tighter credit conditions for households and businesses which are likely to weigh on economic activity, hiring, and inflation."),
 Sentence("Tighter credit conditions are a natural result of tighter monetary policy."),
 Sentence("But the bank stresses that emerged in March may well lead to a further tightening in credit conditions."),
 Sentence("The extent of these effects remains uncertain."),
 Sentence("When bank stress emerged in March, we acted in concert with other government agencies to address it, enabling the Federal Deposit Insurance Corporation to resolve two failed banks in a manner that protected all depositors."),
 Sentence("We also used our liquidity tools to make funding available to banks that might need it.")]

In [None]:
##### The words property returns a WordList object containing a list of Word objects, representing each word in the TextBlob with the punctuation removed

blob.words

WordList(['The', 'economy', 'is', 'also', 'facing', 'headwinds', 'from', 'tighter', 'credit', 'conditions', 'for', 'households', 'and', 'businesses', 'which', 'are', 'likely', 'to', 'weigh', 'on', 'economic', 'activity', 'hiring', 'and', 'inflation', 'Tighter', 'credit', 'conditions', 'are', 'a', 'natural', 'result', 'of', 'tighter', 'monetary', 'policy', 'But', 'the', 'bank', 'stresses', 'that', 'emerged', 'in', 'March', 'may', 'well', 'lead', 'to', 'a', 'further', 'tightening', 'in', 'credit', 'conditions', 'The', 'extent', 'of', 'these', 'effects', 'remains', 'uncertain', 'When', 'bank', 'stress', 'emerged', 'in', 'March', 'we', 'acted', 'in', 'concert', 'with', 'other', 'government', 'agencies', 'to', 'address', 'it', 'enabling', 'the', 'Federal', 'Deposit', 'Insurance', 'Corporation', 'to', 'resolve', 'two', 'failed', 'banks', 'in', 'a', 'manner', 'that', 'protected', 'all', 'depositors', 'We', 'also', 'used', 'our', 'liquidity', 'tools', 'to', 'make', 'funding', 'available', 'to'

In [None]:
#### Parts-of-Speech TaggingParts-of-speech (POS) tagging is the process of evaluating words based on their context to determine
### each word’s part of speech. There are eight primary English parts of speech—nouns, pronouns, verbs, adjectives, adverbs, prepositions,
#### conjunctions and interjections

blob.tags

[('The', 'DT'),
 ('economy', 'NN'),
 ('is', 'VBZ'),
 ('also', 'RB'),
 ('facing', 'VBG'),
 ('headwinds', 'NNS'),
 ('from', 'IN'),
 ('tighter', 'JJR'),
 ('credit', 'NN'),
 ('conditions', 'NNS'),
 ('for', 'IN'),
 ('households', 'NNS'),
 ('and', 'CC'),
 ('businesses', 'NNS'),
 ('which', 'WDT'),
 ('are', 'VBP'),
 ('likely', 'JJ'),
 ('to', 'TO'),
 ('weigh', 'VB'),
 ('on', 'IN'),
 ('economic', 'JJ'),
 ('activity', 'NN'),
 ('hiring', 'NN'),
 ('and', 'CC'),
 ('inflation', 'NN'),
 ('Tighter', 'NNP'),
 ('credit', 'NN'),
 ('conditions', 'NNS'),
 ('are', 'VBP'),
 ('a', 'DT'),
 ('natural', 'JJ'),
 ('result', 'NN'),
 ('of', 'IN'),
 ('tighter', 'JJR'),
 ('monetary', 'JJ'),
 ('policy', 'NN'),
 ('But', 'CC'),
 ('the', 'DT'),
 ('bank', 'NN'),
 ('stresses', 'VBZ'),
 ('that', 'IN'),
 ('emerged', 'VBD'),
 ('in', 'IN'),
 ('March', 'NNP'),
 ('may', 'MD'),
 ('well', 'RB'),
 ('lead', 'VB'),
 ('to', 'TO'),
 ('a', 'DT'),
 ('further', 'JJ'),
 ('tightening', 'NN'),
 ('in', 'IN'),
 ('credit', 'NN'),
 ('conditions', 

In [None]:
blob.noun_phrases

WordList(['tighter credit conditions', 'economic activity', 'tighter', 'credit conditions', 'natural result', 'monetary policy', 'bank stresses', 'march', 'credit conditions', 'bank stress', 'march', 'government agencies', 'deposit', 'insurance corporation', 'liquidity tools'])

# Sentiment Analysis with TextBlob’s Default Sentiment Analyzer

In [None]:
######  One of the most common and valuable NLP tasks is sentiment analysis, which determines whether text is positive, neutral or negative

blob.sentiment

Sentiment(polarity=0.010714285714285723, subjectivity=0.45357142857142857)

In the preceding output, the polarity indicates sentiment with a value from -1.0 (negative) to 1.0 (positive) with 0.0 being neutral. The subjectivity is a value from 0.0 (objective) to 1.0 (subjective). Based on the values for our TextBlob, the overall sentiment is close to neutral, and the text is mostly subjective.

In [None]:
blob.sentiment.polarity

0.010714285714285723

# Sentiment Analysis with the NaiveBayesAnalyzer

In [None]:
#### he TextBlob library also comes with a NaiveBayesAnalyzer9 (module textblob.sentiments),
##### which was trained on a database of movie reviews. Naive Bayes10 is a commonly used machine learning text-classification algorithm.

from textblob.sentiments import NaiveBayesAnalyzer
blob = TextBlob(text, analyzer=NaiveBayesAnalyzer())


In [None]:
blob.sentiment

Sentiment(classification='pos', p_pos=0.999999292366756, p_neg=7.07633249968906e-07)

In this case, the overall sentiment is classified as negative (classification='neg'). The Sentiment object’s p_pos indicates that the TextBlob is 99.99 % positive, and its p_neg indicates that the TextBlob is 0.0000007.3 % negative. Since the overall sentiment is POSITIVE

# Normalization: Stemming and Lemmatization

In [None]:
##### Stemming removes a prefix or suffix from a word leaving only a stem, which may or may not be a real word.
###### Lemmatization is similar, but factors in the word’s part of speech and meaning and results in a real word.

# FinancialBERT for Sentiment Analysis

FinancialBERT is a BERT model pre-trained on a large corpora of financial texts. The purpose is to enhance financial NLP research and practice in financial domain, hoping that financial practitioners and researchers can benefit from this model without the necessity of the significant computational resources required to train the model.

In [None]:
from transformers import BertTokenizer, BertModel, BertForSequenceClassification
from transformers import pipeline

In [None]:
classifier = pipeline("sentiment-analysis")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [None]:
##### Now lets use classifier to analyze our text
# Define the text as a single string
classifier("The economy is also facing headwinds from tighter credit conditions for households and businesses.",
        "Which are likely to weigh on economic activity, hiring, and inflation.",
        "Tighter credit conditions are a natural result of tighter monetary policy. But the bank stresses that emerged in March may well lead to a further tightening in credit conditions.",
        "The extent of these effects remains uncertain.",
        "When bank stress emerged in March, we acted in concert with other government agencies to address it, enabling the Federal Deposit Insurance Corporation to resolve two failed banks in a manner that protected all depositors.",
        "We also used our liquidity tools to make funding available to banks that might need it.",
        )





TypeError: TextClassificationPipeline.__call__() takes 2 positional arguments but 7 were given

In [None]:
####3 ujsiing financiaBERT
model = BertForSequenceClassification.from_pretrained("ahmedrachid/FinancialBERT-Sentiment-Analysis",num_labels=3)
tokenizer = BertTokenizer.from_pretrained("ahmedrachid/FinancialBERT-Sentiment-Analysis")


In [None]:
nlp = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

sentences = ["The economy is also facing headwinds from tighter credit conditions for households and businesses.",
             "Which are likely to weigh on economic activity, hiring, and inflation.",
             "Tighter credit conditions are a natural result of tighter monetary policy. But the bank stresses that emerged in March may well lead to a further tightening in credit conditions.",
             "The extent of these effects remains uncertain.",
             "When bank stress emerged in March, we acted in concert with other government agencies to address it, enabling the Federal Deposit Insurance Corporation to resolve two failed banks in a manner that protected all depositors.",
             "We also used our liquidity tools to make funding available to banks that might need it.",
            ]

# Analyze the sentiment of the entire text
results = nlp(sentences)

print(results)



[{'label': 'negative', 'score': 0.9989545345306396}, {'label': 'neutral', 'score': 0.9949797987937927}, {'label': 'negative', 'score': 0.9926998019218445}, {'label': 'neutral', 'score': 0.9414845108985901}, {'label': 'positive', 'score': 0.9889850616455078}, {'label': 'neutral', 'score': 0.9995952248573303}]


In [None]:
###### base opn our text, lets prompt a text generation from financial bert model

from transformers import pipeline
generator = pipeline('text-generation', model='ahmedrachid/FinancialBERT-Sentiment-Analysis')
generator("FinancialBERT has", do_sample=True, max_length=20)



If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
Some weights of BertLMHeadModel were not initialized from the model checkpoint at ahmedrachid/FinancialBERT-Sentiment-Analysis and are newly initialized: ['cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[{'generated_text': 'FinancialBERT has cashsticmbl loom opening pacehoo injury half telephone incurrence added driven cusrd perc buyingver'}]

In [None]:
#### generate text using promt

prompt="the state of economy in USA "
res=generator(prompt, truncation=True, max_length=100,temperature=0.9)
print(res[0]['generated_text'])



the state of economy in USA  est status night attendant 85% world integrated oct cannibalization contract pdufa fin man inning offshore density integration finishes long bargaining que vitamin integration base discrete await await respect base ete design flu formulation monitor acquisitive lasting newport phase diligence anti council taxwing namely lb impa omni audits borders pdufasmi kro pre networking worksholder per blam romflowresearch theatrical bhd bhd friendlyasti macro d design flu procurement flu thereofholderuct cotton congressional design flu capacity await await respect baseaction flu thereofanyld self reliedflow
