# KeyWord Extraction 

In [35]:
title = 'Consumer prices rose 8.5% in March, slightly hotter than expected and the highest since 1981'

In [36]:
doc = """
Prices that consumers pay for everyday items surged in March to their highest levels since the early days of the Reagan administration, according to Labor Department data released Tuesday.

The consumer price index, which measures a wide-ranging basket of goods and services, jumped 8.5% from a year ago on an unadjusted basis, above even the already elevated Dow Jones estimate for 8.4%.

Excluding food and energy, so-called core CPI increased 6.5% on a 12-month basis, in line with the expectation. However, there were signs that core inflation appeared to be ebbing, as it rose just 0.3% for the month, less than the 0.5% estimate. That in turn sparked some hope that inflation overall was easing and that March might represent the peak.

Markets reacted positively to the report as stocks rose and government bond yields declined.

“The big news in the March report was that core price pressures finally appear to be moderating,” wrote Andrew Hunter, senior U.S. economist at Capital Economics. Hunter said he thinks the March increase will “mark the peak” for inflation as year-over-year comparisons drive the numbers lower and energy prices subside.

Federal Reserve Governor Lael Brainard said the slowing increase in core CPI is a “welcome” development in the effort to bring down inflation.

″“I’ll be looking to see whether we continue to see moderation in the months ahead,” Brainard told the Wall Street Journal.

The data reflected price rises not seen in the U.S. since the stagflation days of the late 1970s and early ’80s. March’s headline reading in fact was the highest since December 1981. Core inflation was the hottest since August 1982.

Due to the surge in inflation, worker wages, despite rising 5.6% from a year ago, weren’t keeping pace with the cost of living. Real average hourly earnings posted a seasonally adjusted 0.8% decline for the month, according to a separate Bureau of Labor Statistics report.

The inability of wages to keep up with costs could add to inflation pressures.

WATCH NOW
VIDEO02:20
Morning Meeting sneak peek: Jim Cramer breaks down today’s inflation report
The Atlanta Federal Reserve wage tracker for March indicated gains of another 6% which is “symptomatic of inflation pressures continuing to broaden,” said Brian Coulton, chief economist at Fitch Ratings. Coulton pointed out that the core inflation deceleration was due largely to a drop in auto prices, while other prices continued to show increases.

Shelter costs, which make up about one-third of the CPI weighting, increased another 0.5% on the month, making the 12-month gain a blistering 5%, the highest since May 1991.

To combat inflation, the Fed has begun raising interest rates and is expected to continue doing so through the remainder of the year and into 2023. The last time prices were this high, the Fed raised its benchmark rate to nearly 20%, pulling the economy into a recession that finally defeated inflation.

Economists generally don’t expect a recession this time around, though many on Wall Street are raising the probability of a downturn.

“Overall, this report is encouraging, at the margin, though it is far too soon to be sure that the next few core prints will be as low; much depends on the path of used vehicle prices, which is very hard to forecast with confidence,” wrote Ian Shepherdson, chief economist at Pantheon Macroeconomics. “We’re sure they will fall, but the speed of the decline is what matters.”

Price increases came from many of the usual culprits.

Food rose 1% for the month and 8.8% over the year, as prices for goods such as rice, ground beef, citrus fruits and fresh vegetables all posted gains of more than 2% in March. Energy prices were up 11% and 32%, respectively, as gasoline prices popped 18.3% for the month, boosted by the war in Ukraine and the pressure it is exerting on supply.

One sector that has been a major driver in the inflation burst subsided in March. Used car and truck prices declined 3.8% for the month, though they are still up 35.3% on the year. Also, commodity prices excluding food and energy fell by 0.4%.

Those declines, however, were offset by gains in clothing, services excluding energy and medical care, each of which increased 0.6% for the month. Transportation services also rose 2%, bringing its 12-month gain to 7.7%.

In a sign of economic recovery from a sector hard-hit during the Covid pandemic, airline fares jumped by 10.7% in the month and were up 23.6% from a year ago.

"""

## Rake_NLTK

* Split the input text content by dotes
* Create a matrix of word co-occurrences
* Word scoring – That score can be calculated as the degree of a word in the matrix, as the word frequency, or as the degree of the word divided by its frequency
* keyphrases can also create by combining the keywords
* A keyword or keyphrase is chosen if and only if its score belongs to the top T scores where T is the number of keywords you want to extract

## Spacy

## Textrank

## Word cloud

## KeyBert

KeyBERT is a basic and easy-to-use keyword extraction technique that generates the most similar keywords and keyphrases 
to a given document using BERT embeddings. It uses BERT-embeddings and basic cosine similarity to locate the sub-documents
in a document that are the most similar to the document itself.

* BERT 임베딩을 사용하여 가장 비슷한 키워드들 찾아내는 방법 
* 코사인 유사도를 통해 키워드 추출

### KeyBERT 사용법

In [11]:
# !pip install keybert

Collecting keybert
  Downloading keybert-0.5.1.tar.gz (19 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting rich>=10.4.0
  Downloading rich-12.2.0-py3-none-any.whl (229 kB)
     -------------------------------------- 229.8/229.8 KB 6.9 MB/s eta 0:00:00
Collecting commonmark<0.10.0,>=0.9.0
  Downloading commonmark-0.9.1-py2.py3-none-any.whl (51 kB)
     ---------------------------------------- 51.1/51.1 KB ? eta 0:00:00
Building wheels for collected packages: keybert
  Building wheel for keybert (setup.py): started
  Building wheel for keybert (setup.py): finished with status 'done'
  Created wheel for keybert: filename=keybert-0.5.1-py3-none-any.whl size=21333 sha256=6628f0d028295b64f3efca037327975269ce1cbf2eb1d402bdff9df489225cf0
  Stored in directory: c:\users\21ckw\appdata\local\pip\cache\wheels\94\18\2a\f26bbcd25924aab452bb4bcc2345a55c07160823d196a264c7
Successfully built keybert
Installing collected packages: commo

In [None]:
# ECONOMY
# Consumer prices rose 8.5% in March, slightly hotter than expected and the highest since 1981

In [13]:
# 1번째 방법

from keybert import KeyBERT

#doc 뉴스내용 맨위에 있음

kw_model = KeyBERT()
keywords = kw_model.extract_keywords(doc)
print(keywords)

Downloading:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/10.2k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/349 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/350 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

[('inflation', 0.5043), ('cpi', 0.4757), ('recession', 0.3989), ('march', 0.335), ('macroeconomics', 0.3087)]


In [39]:
# 2번째 방법
from sklearn.feature_extraction.text import CountVectorizer

# 3개의 단어 묶음인 단어구 추출
n_gram_range = (3, 3)
stop_words = "english"

count = CountVectorizer(ngram_range=n_gram_range, stop_words=stop_words).fit([doc])
# ((3,3)사이즈 + 불용어제거)한 키워드 후보
candidates = count.get_feature_names_out()

print('trigram 개수 :',len(candidates))
print('trigram 다섯개만 출력 :',candidates[:5])

trigram 개수 : 409
trigram 다섯개만 출력 : ['10 month 23' '11 32 respectively' '12 month basis' '12 month gain'
 '18 month boosted']


In [42]:
# 수치화
# pre-trained model 사용
# https://www.sbert.net/docs/pretrained_models.html
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('distilbert-base-nli-mean-tokens') 

# 전체문서(doc), 키워드(candidate) 숫자 인코딩
doc_embedding = model.encode([doc])
candidate_embeddings = model.encode(candidates)

Downloading:   0%|          | 0.00/690 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/3.99k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/550 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/122 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/229 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/265M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/450 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [45]:
doc_embedding[0]

array([ 2.34511271e-02, -6.65197968e-01,  3.28258693e-01, -8.51862907e-01,
       -6.59396797e-02,  1.98907226e-01, -9.41708624e-01, -8.66065562e-01,
       -4.98717010e-01,  2.87233770e-01,  9.69592094e-01,  7.33804345e-01,
       -4.53447372e-01,  6.18324280e-01,  4.87614065e-01, -1.21339536e+00,
        1.21255517e-01,  3.67295444e-02,  4.24046159e-01, -2.00928897e-02,
        4.47170436e-01,  1.76481813e-01,  1.64404541e-01,  1.00772965e+00,
       -7.94420540e-01, -5.30070484e-01, -1.97389483e-01, -6.69744015e-01,
        8.45306143e-02,  4.06941205e-01, -1.70291096e-01,  1.22230530e-01,
       -2.90161431e-01, -8.09550062e-02, -7.34419376e-02,  1.56845711e-02,
       -1.78943440e-01, -1.69675469e-01,  3.65505517e-01, -1.52073193e+00,
       -2.48372316e-01, -9.96205091e-01,  6.99268579e-02,  2.65412867e-01,
       -2.67742932e-01, -2.62911245e-03, -1.41044557e-01, -3.15267622e-01,
        3.44910622e-01,  1.00471407e-01, -4.55298454e-01,  4.89116162e-01,
        1.12212598e-01, -

In [47]:
# 문서와 가장 유사한 키워드 추출
from sklearn.metrics.pairwise import cosine_similarity
top_n = 5
distances = cosine_similarity(doc_embedding, candidate_embeddings)
keywords = [candidates[index] for index in distances.argsort()[0][-top_n:]] # argsort(): array 오름차순정렬
print(keywords)

['tuesday consumer price', 'recession finally defeated', 'gasoline prices popped', 'economy recession finally', 'peak inflation year']


### 다양한 키워드 찾기1: Max Sum Similarity<br>
데이터 쌍 사이의 최대 합 거리는 데이터 쌍 간의 거리가 최대화되는 데이터 쌍으로 정의<br>
키워드 후보 간의 유사성을 최소화하면서 문서와의 후보 유사성을 극대화

In [57]:
import itertools

def max_sum_sim(doc_embedding, candidate_embeddings, words, top_n, nr_candidates): #상위 10개의 키워드를 선택,서로 가장 유사성이 낮은 5개를 선택
    # 문서와 각 키워드들 간의 유사도
    distances = cosine_similarity(doc_embedding, candidate_embeddings)

    # 각 키워드들 간의 유사도
    distances_candidates = cosine_similarity(candidate_embeddings, 
                                            candidate_embeddings)

    # 코사인 유사도에 기반하여 키워드들 중 상위 top_n개의 단어를 pick.
    words_idx = list(distances.argsort()[0][-nr_candidates:])
    words_vals = [candidates[index] for index in words_idx]
    distances_candidates = distances_candidates[np.ix_(words_idx, words_idx)]

    # 각 키워드들 중에서 가장 덜 유사한 키워드들간의 조합을 계산
    min_sim = np.inf
    candidate = None
    for combination in itertools.combinations(range(len(words_idx)), top_n):
        sim = sum([distances_candidates[i][j] for i in combination for j in combination if i != j])
        if sim < min_sim:
            candidate = combination
            min_sim = sim

    return [words_vals[idx] for idx in candidate]

In [58]:
max_sum_sim(doc_embedding, candidate_embeddings, candidates, top_n=5, nr_candidates=10)

['inflation hottest august',
 'weighting increased month',
 'tuesday consumer price',
 'gasoline prices popped',
 'economy recession finally']

In [59]:
# 이해안가는 코드 
# 같은 embeddings끼리 벡터 연산을 하면 의미가 있나?
# distances_candidates = cosine_similarity(candidate_embeddings, candidate_embeddings)

# words_idx = list(distances.argsort()[0][-nr_candidates:])

In [61]:
cosine_similarity(candidate_embeddings,candidate_embeddings)

array([[1.0000001 , 0.616074  , 0.8208634 , ..., 0.6212555 , 0.6630509 ,
        0.28212598],
       [0.616074  , 0.9999999 , 0.6439796 , ..., 0.5804162 , 0.5693418 ,
        0.44779062],
       [0.8208634 , 0.6439796 , 1.        , ..., 0.635149  , 0.65534395,
        0.2869252 ],
       ...,
       [0.6212555 , 0.5804162 , 0.635149  , ..., 1.0000001 , 0.75686044,
        0.4660572 ],
       [0.6630509 , 0.5693418 , 0.65534395, ..., 0.75686044, 1.0000002 ,
        0.30874515],
       [0.28212598, 0.44779062, 0.2869252 , ..., 0.4660572 , 0.30874515,
        1.0000002 ]], dtype=float32)

### 다양한 키워드 찾기2: Maximal Marginal Relevance<br>
중복을 최소화하고 결과의 다양성을 극대화하기 위해 노력

In [54]:
def mmr(doc_embedding, candidate_embeddings, words, top_n, diversity):

    # 문서와 각 키워드들 간의 유사도가 적혀있는 리스트
    word_doc_similarity = cosine_similarity(candidate_embeddings, doc_embedding)

    # 각 키워드들 간의 유사도
    word_similarity = cosine_similarity(candidate_embeddings)

    # 문서와 가장 높은 유사도를 가진 키워드의 인덱스를 추출.
    # 만약, 2번 문서가 가장 유사도가 높았다면
    # keywords_idx = [2]
    keywords_idx = [np.argmax(word_doc_similarity)]

    # 가장 높은 유사도를 가진 키워드의 인덱스를 제외한 문서의 인덱스들
    # 만약, 2번 문서가 가장 유사도가 높았다면
    # ==> candidates_idx = [0, 1, 3, 4, 5, 6, 7, 8, 9, 10 ... 중략 ...]
    candidates_idx = [i for i in range(len(words)) if i != keywords_idx[0]]

    # 최고의 키워드는 이미 추출했으므로 top_n-1번만큼 아래를 반복.
    # ex) top_n = 5라면, 아래의 loop는 4번 반복됨.
    for _ in range(top_n - 1):
        candidate_similarities = word_doc_similarity[candidates_idx, :]
        target_similarities = np.max(word_similarity[candidates_idx][:, keywords_idx], axis=1)

        # MMR을 계산
        mmr = (1-diversity) * candidate_similarities - diversity * target_similarities.reshape(-1, 1)
        mmr_idx = candidates_idx[np.argmax(mmr)]

        # keywords & candidates를 업데이트
        keywords_idx.append(mmr_idx)
        candidates_idx.remove(mmr_idx)

    return [words[idx] for idx in keywords_idx]

In [55]:
mmr(doc_embedding, candidate_embeddings, candidates, top_n=5, diversity=0.2)

['peak inflation year',
 'economy recession finally',
 'tuesday consumer price',
 'gasoline prices popped',
 'inflation hottest august']

## Yake(Yet Another Keyword Extractor)

YAKE is a basic unsupervised automatic keyword extraction method that identifies the most relevant keywords 
in a text by using text statistical data from single texts.This technique does not rely on dictionaries, external corpora, 
text size, language, or domain, and it does not require training on a specific set of documents

* Unsupervised approach(비지도 학습 접근법)
* Corpus-Independent
* Domain and Language Independent (문서내용, 사이즈  등에 상관없음)
* Single-Document(?)


### Yake 사용법

In [14]:
# !pip install yake

Collecting yake
  Downloading yake-0.4.8-py2.py3-none-any.whl (60 kB)
     ---------------------------------------- 60.2/60.2 KB 3.3 MB/s eta 0:00:00
Collecting tabulate
  Downloading tabulate-0.8.9-py3-none-any.whl (25 kB)
Collecting jellyfish
  Downloading jellyfish-0.9.0-cp39-cp39-win_amd64.whl (26 kB)
Collecting segtok
  Downloading segtok-1.5.11-py3-none-any.whl (24 kB)
Installing collected packages: tabulate, segtok, jellyfish, yake
Successfully installed jellyfish-0.9.0 segtok-1.5.11 tabulate-0.8.9 yake-0.4.8


In [17]:
import yake

In [18]:
kw_extractor = yake.KeywordExtractor()
keywords =  kw_extractor.extract_keywords(doc)
for kw in keywords:
    print(kw)

('data released Tuesday', 0.009428058051207406)
('Department data released', 0.0096234876313734)
('everyday items surged', 0.012462717013554392)
('Reagan administration', 0.014818631568493707)
('released Tuesday', 0.014917608852455437)
('Labor Department data', 0.01516161268816772)
('Dow Jones estimate', 0.02147471627112143)
('Labor Department', 0.02254836536627868)
('inflation', 0.023716550782715107)
('March', 0.024271860360091033)
('elevated Dow Jones', 0.025174206418134455)
('Prices', 0.029962904876262086)
('core inflation', 0.03724745085229474)
('Department data', 0.03804939066847838)
('month', 0.04095325379825359)
('Dow Jones', 0.04218739741688678)
('core', 0.046571555082663835)
('core CPI', 0.05084221065090602)
('pay for everyday', 0.05169130201458279)
('everyday items', 0.05169130201458279)


In [62]:
kw[1]

0.05169130201458279

## MonkeyLearn API

Advantages of keyword extraction automation

* Product descriptions, customer feedback, and other sources can all be used to extract keywords.
* Determine which terms are most frequently used by customers.
* Monitoring of brand, product, and service references in real-time
* It is possible to automate and speed up data extraction and entry.


### MonkeyLearn API 사용법

In [19]:
!pip install monkeylearn

Collecting monkeylearn
  Downloading monkeylearn-3.6.0-py3-none-any.whl (17 kB)
Installing collected packages: monkeylearn
Successfully installed monkeylearn-3.6.0


In [29]:
from monkeylearn import MonkeyLearn

ml = MonkeyLearn('ba02694bb0686d4e9dfefb9f26723346ce53a278')

data = [doc]
model_id = 'ex_YCya9nrn'
result = ml.extractors.extract(model_id, data)
dataDict = result.body
for item in dataDict[0]['extractions'][:10]:
    print("단어이름: ", item['parsed_value'])
    print("연관성: ", item['relevance'])

단어이름:  inflation
연관성:  0.980
단어이름:  inflation pressure
연관성:  0.782
단어이름:  chief economist
연관성:  0.782
단어이름:  energy price
연관성:  0.782
단어이름:  month
연관성:  0.600
단어이름:  dow jones estimates
연관성:  0.587
단어이름:  used vehicle price
연관성:  0.587
단어이름:  government bond yield
연관성:  0.587
단어이름:  federal reserve wage
연관성:  0.587
단어이름:  core price pressure
연관성:  0.587


In [25]:
dataDict[0].keys()

dict_keys(['text', 'external_id', 'error', 'extractions'])

In [27]:
dataDict[0]['extractions']

[{'tag_name': 'KEYWORD',
  'parsed_value': 'inflation',
  'count': 13,
  'relevance': '0.980',
  'positions_in_text': [539,
   673,
   1067,
   1291,
   1616,
   1682,
   1994,
   2096,
   2219,
   2351,
   2650,
   2933,
   3905]},
 {'tag_name': 'KEYWORD',
  'parsed_value': 'inflation pressure',
  'count': 2,
  'relevance': '0.782',
  'positions_in_text': [1994, 2219]},
 {'tag_name': 'KEYWORD',
  'parsed_value': 'chief economist',
  'count': 2,
  'relevance': '0.782',
  'positions_in_text': [2283, 3337]},
 {'tag_name': 'KEYWORD',
  'parsed_value': 'energy price',
  'count': 2,
  'relevance': '0.782',
  'positions_in_text': [1135, 3688]},
 {'tag_name': 'KEYWORD',
  'parsed_value': 'month',
  'count': 12,
  'relevance': '0.600',
  'positions_in_text': [460,
   601,
   1373,
   1869,
   2562,
   2583,
   3533,
   3777,
   3988,
   4243,
   4304,
   4443]},
 {'tag_name': 'KEYWORD',
  'parsed_value': 'dow jones estimates',
  'count': 1,
  'relevance': '0.587',
  'positions_in_text': [361]}

## Textrazor API

Textrazor is a good choice for developers that need speedy extraction tools with comprehensive customization options

url을 넣어야 함<br>
무료버전 500 requests per day

### Textrazor API 사용법

In [30]:
# !pip install textrazor

Collecting textrazor
  Downloading textrazor-1.4.0.tar.gz (16 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: textrazor
  Building wheel for textrazor (setup.py): started
  Building wheel for textrazor (setup.py): finished with status 'done'
  Created wheel for textrazor: filename=textrazor-1.4.0-py3-none-any.whl size=17021 sha256=1db148439c0d832a8387b4fdc780d45ad98bb3c642430051ff7b650874450583
  Stored in directory: c:\users\21ckw\appdata\local\pip\cache\wheels\f6\20\72\0f25f865cb1ced565b5fd2e828136cd9775357f67a986029e8
Successfully built textrazor
Installing collected packages: textrazor
Successfully installed textrazor-1.4.0


In [34]:
import textrazor
textrazor.api_key = '36ca83664d76dbac7629b52bc825052e00416a283515071f6248be98'
client = textrazor.TextRazor(extractors=["entities", "topics"])
response = client.analyze_url("https://www.cnbc.com/2022/04/12/consumer-prices-rose-8point5percent-in-march-slightly-hotter-than-expected.html")
for entity in response.entities():
    print(entity.id, entity.relevance_score, entity.confidence_score)

Cost 0.2703 2.33
United States 0.3843 24.46
Economics 0.5174 18.84
Wage 0.2983 2.763
Economics 0.5058 18.84
Stagflation 0.6904 13.13
1970s 0.3816 1.493
Inflation 0.9164 18.67
Cost 0.1943 1.883
Cost of living 0.4552 2.815
News 0.3461 1.506
Pressure 0.1722 1.149
Food 0.2683 5.453
Consumer 0.1031 5.149
Price 0.469 3.884
Consumer price index 0.6144 17.72
Price index 0.4941 2.975
Index (economics) 0.7378 2.295
Basket 0.0254 1.646
Inflation 0.9164 17.83
Wall Street 0.4894 7.004
The Wall Street Journal 0.4902 37.83
Data 0.1183 0.963
Lael Brainard 0.2864 2.401
Government bond 0.4045 4.07
Bond (finance) 0.4625 1.293
Hope 0.184 0.9824
Bureau of Labor Statistics 0.09554 16.11
Statistics 0.2427 1.573
Jim Cramer 0.3229 5.534
Atlanta 0.2686 9.267
Wage 0.2983 3.58
Signs and symptoms 0.1746 1.783
Chief economist 0.06688 1.333
Economics 0.495 18.84
Fitch Ratings 0.09905 5.477
Inflation 0.9164 18.43
Acceleration 0.1222 2.255
Goods 0.3258 1.317
Goods and services 0.1581 1.431
Federal Reserve 0.4816 11.58

In [None]:
키워드 2500개 뉴스 다 읽을 수 없으니 -> 요약한 걸 제공하자 ->
키워드 추출 빈도수 -> 키워드제공 -> 요약은 힘듬 
-> 2500개 축약 기사의 title만 제공 + 키워드에 맞는 핵심문장 추출해서 제공

* 현 상황 문제점
1. 뉴스크롤링(forex) 기사본문 안긁어짐(지혜누나꺼 1700개중 70개만 본문 긁어짐)
2. 정확히 어떤 것을 구현할지 애매함
1) KeyWord만 추출할 시 따로 딥러닝 필요없고
2) finBERT는 Keyword가 아닌 문장의 점수를 반환

-> finBERT로 Keyword 점수를 뽑는법을 찾거나, 문장단위로 뭘하거나 추가 회의필요
