# Dictionary Tutorial

This notebook demonstrates the use of the `FewShotX` package, available for download [here](https://github.com/RenatoVassallo/FewShotX).

We’ll use a simple dictionary to count words, leveraging spaCy’s `en_core_web_sm` model for tokenization and stop word filtering.

In [1]:
import pandas as pd

url = "https://github.com/RenatoVassallo/FewShotX/raw/main/src/FewShotX/datasets/econland_corpus.csv"

# Read the CSV directly from the raw link
df_corpus = pd.read_csv(url).sample(100, random_state=42)
df_corpus = df_corpus[["headline", "is_financial"]].reset_index(drop=True)
df_corpus.head()

Unnamed: 0,headline,is_financial
0,The central bank signaled a pause in interest ...,1
1,Investors show growing confidence in emerging ...,1
2,City Council Debates Infrastructure Plan,0
3,Investment in Tech Sector Slows Amid Global Un...,1
4,New coach implements strict budgeting for play...,0


In [2]:
from FewShotX import DictionaryScorer

# Create a simple economic dictionary
dictionary = {"economy": ["economy", "consumption", "inflation", "investment", "invest", "confidence"]}

# Apply our scorer to the corpus
scorer = DictionaryScorer(dictionaries=dictionary, model_name="en_core_web_sm")
df_dict = scorer.score_df(df_corpus, text_col="headline")
df_dict.head()

Dictionary scoring with spaCy: 100%|██████████| 100/100 [00:00<00:00, 1002.82it/s]


Unnamed: 0,headline,is_financial,economy,preprocessed_headline
0,The central bank signaled a pause in interest ...,1,0,"[central, bank, signaled, pause, interest, rat..."
1,Investors show growing confidence in emerging ...,1,1,"[investors, growing, confidence, emerging, mar..."
2,City Council Debates Infrastructure Plan,0,0,"[city, council, debates, infrastructure, plan]"
3,Investment in Tech Sector Slows Amid Global Un...,1,1,"[investment, tech, sector, slows, amid, global..."
4,New coach implements strict budgeting for play...,0,0,"[new, coach, implements, strict, budgeting, pl..."
