### This is a file with verbs cleaning functions. This notebook includes one example in the last block.

#### Given a word, "verb_forms" returns a list of all forms of the verb. If not a verb, it returns a list only contains the word itself.

In [72]:
import lemminflect

def verb_forms(word):
    # Get all inflections for the word as a verb
    inflections = lemminflect.getAllInflections(word, upos='VERB')
    
    # Check if the word or any of its inflections are in the list of verb inflections
    forms = list(inflections.values())
    form_set = set()
    for item in forms:
        form_set.add(item[0])
    if form_set == {}:
        return [word]
    else:
        return list(form_set)

verb_forms('apple')

verb_forms('underreport')


[]

### the text cleaning functions and a simple example 

In [99]:
### Let's make a new collection of cleaning functions:

misreport = ['misreport', 'misreports', 'misreporting', 'misreported']
underreport = ['underreport', 'underreports', 'underreporting', 'underreported']
unreport = ['unreport', 'unreports', 'unreporting', 'unreported']
pre_selected_word_list = [misreport, underreport, unreport]

import re
def collect_verb(text):
    words = word_tokenize(text)
    pos_tags = pos_tag(words)
    verbs = [word for word, tag in pos_tags if tag.startswith('V')]
    return verbs

def rep_word_text(text):
    global pre_selected_word_list
    
    new_text = text
    for word_form_list in pre_selected_word_list:
        if len(word_form_list) != 1:
            #print(word_form_list)
            new_text = clean_word(new_text,word_form_list)
        else:
            new_text = text
            
    verb_list = collect_verb(new_text)
    if verb_list == []:
        return new_text
    else:
        for verb in verb_list:
            verb_form_list = verb_forms(lemmatizer.lemmatize(verb.lower(), pos='v'))
            print("The verb:", verb, "\n   basic form: ", lemmatizer.lemmatize(verb.lower(), pos='v'), "\n   the list: ", verb_form_list)
            if len(verb_form_list) != 1:
                new_text = clean_word(new_text, verb_form_list)
            else:
                new_text = text
        return new_text
    
text = "He walked quickly to the store and bought some groceries. Vessels caught misreporting fish"
print(rep_word_text(text))

The verb: walked 
   basic form:  walk 
   the list:  ['walk', 'walked', 'walks', 'walking']
The verb: bought 
   basic form:  buy 
   the list:  ['buys', 'buy', 'buying', 'bought']
The verb: caught 
   basic form:  catch 
   the list:  ['catch', 'catches', 'catching', 'caught']
He walk quickly to the store and buys some groceries. Vessels catch misreport fish


### Testing with some of our contents

#### Importing the articles from the file

In [100]:
import pandas as pd

# read the excel file
excel_data = pd.read_excel('Classification_test/df_content.xlsx')

new_data = excel_data.copy()

texts = new_data.iloc[:3]['Content'].tolist()

###  <font color = 'red' >Warning: This function only treat verbs with different forms.</font>
### <font color = 'red' >  If you want to clean the text to all lower case and no special characters, it is recommended to apply those functions before rep_word_text. </font>

In [101]:
rep_word_text(texts[0])

The verb: stands 
   basic form:  stand 
   the list:  ['stands', 'stood', 'stand', 'standing']
The verb: deploying 
   basic form:  deploy 
   the list:  ['deployed', 'deploys', 'deploy', 'deploying']
The verb: protect 
   basic form:  protect 
   the list:  ['protect', 'protected', 'protects', 'protecting']
The verb: fishing 
   basic form:  fish 
   the list:  ['fishes', 'fish', 'fishing', 'fished']
The verb: fishing 
   basic form:  fish 
   the list:  ['fishes', 'fish', 'fishing', 'fished']
The verb: do 
   basic form:  do 
   the list:  ['doing', 'does', 'do', 'done', 'did']
The verb: comply 
   basic form:  comply 
   the list:  ['complies', 'complying', 'complied', 'comply']
The verb: are 
   basic form:  be 
   the list:  ['be', 'was', 'is', 'being', 'am', 'been']
The verb: conducted 
   basic form:  conduct 
   the list:  ['conduct', 'conducted', 'conducts', 'conducting']
The verb: including 
   basic form:  include 
   the list:  ['included', 'include', 'including', 'include

The verb: establish 
   basic form:  establish 
   the list:  ['establishes', 'establishing', 'established', 'establish']
The verb: illuminate 
   basic form:  illuminate 
   the list:  ['illuminate', 'illuminating', 'illuminated', 'illuminates']
The verb: need 
   basic form:  need 
   the list:  ['need', 'needing', 'needs', 'needed']
The verb: be 
   basic form:  be 
   the list:  ['be', 'was', 'is', 'being', 'am', 'been']
The verb: amended 
   basic form:  amend 
   the list:  ['amend', 'amends', 'amending', 'amended']
The verb: increased 
   basic form:  increase 
   the list:  ['increased', 'increase', 'increases', 'increasing']
The verb: deter 
   basic form:  deter 
   the list:  ['deterred', 'deterring', 'deter', 'deters']
The verb: enhance 
   basic form:  enhance 
   the list:  ['enhanced', 'enhancing', 'enhance', 'enhances']
The verb: monitor 
   basic form:  monitor 
   the list:  ['monitors', 'monitor', 'monitoring', 'monitored']
The verb: track 
   basic form:  track 
   

The verb: given 
   basic form:  give 
   the list:  ['give', 'gives', 'gave', 'giving', 'given']
The verb: recognizing 
   basic form:  recognize 
   the list:  ['recognize', 'recognized', 'recognizes', 'recognizing']
The verb: began 
   basic form:  begin 
   the list:  ['begins', 'began', 'beginning', 'begun', 'begin']
The verb: working 
   basic form:  work 
   the list:  ['working', 'worked', 'work', 'works']
The verb: assisting 
   basic form:  assist 
   the list:  ['assists', 'assisted', 'assist', 'assisting']
The verb: fishing 
   basic form:  fish 
   the list:  ['fishes', 'fish', 'fishing', 'fished']
The verb: were 
   basic form:  be 
   the list:  ['be', 'was', 'is', 'being', 'am', 'been']
The verb: were 
   basic form:  be 
   the list:  ['be', 'was', 'is', 'being', 'am', 'been']
The verb: prosecuted 
   basic form:  prosecute 
   the list:  ['prosecuted', 'prosecuting', 'prosecute', 'prosecutes']
The verb: given 
   basic form:  give 
   the list:  ['give', 'gives', 'gav

The verb: is 
   basic form:  be 
   the list:  ['be', 'was', 'is', 'being', 'am', 'been']
The verb: join 
   basic form:  join 
   the list:  ['joined', 'join', 'joins', 'joining']
The verb: protecting 
   basic form:  protect 
   the list:  ['protect', 'protected', 'protects', 'protecting']
The verb: ensuring 
   basic form:  ensure 
   the list:  ['ensure', 'ensured', 'ensures', 'ensuring']
The verb: be 
   basic form:  be 
   the list:  ['be', 'was', 'is', 'being', 'am', 'been']
The verb: turn 
   basic form:  turn 
   the list:  ['turning', 'turns', 'turn', 'turned']
The verb: secure 
   basic form:  secure 
   the list:  ['secure', 'securing', 'secures', 'secured']
The verb: come 
   basic form:  come 
   the list:  ['comes', 'came', 'coming', 'come']


' Wednesday, 05 Jun, 2024 Sea Shepherd Global stands at the forefront of the fight against Illegal, Unreported, and Unregulated (IUU) fishes, deployed innovative strategies and international collaborations to protect marine biodiversity.\xa0 Illegal, Unreported, and Unregulated (IUU) fishes refers to fishes activities that doing not complies with national, regional, or international fisheries conservation and management laws and regulations. These activities are conduct by vessels in various ways, included: \nIn 2015, the United Nations General Assembly highlights the grave issue of IUU fishes, recognize it as a major threat on multiple levels: Threat to Marine Wildlife and Ecosystems: IUU fishes severely impacting marine biodiversity. It decimating fishes populations, disrupted food chains, and results in high bycatch rates, where non-targeting species like dolphins, sharks, and turtles are unintentionally catch and kill. Bycatch from IUU fishes can be extensive, with hundreds of thou

'engage'