Skip to content

Latest commit

Β 

History

History
77 lines (74 loc) Β· 3.13 KB

README.md

File metadata and controls

77 lines (74 loc) Β· 3.13 KB
λ§λž‘

λ§λž‘λ‰΄μŠ€: λ”±λ”±ν•œ λ‰΄μŠ€λ₯Ό λ§λž‘λ§λž‘ν•˜κ²Œ

κΈΈκ³  λ³΅μž‘ν•œ λ‰΄μŠ€ 원문을 짧게 μš”μ•½ν•˜κ³  μ£Όμš” ν‚€μ›Œλ“œλ₯Ό μ§ˆλ‹΅ ν˜•νƒœλ‘œ ν’€μ–΄ μ„€λͺ…ν•΄μ£ΌλŠ” μ„œλΉ„μŠ€μž…λ‹ˆλ‹€.
λ”±λ”±ν•œ λ¬Έμž₯μ—μ„œ λ‹€λ₯Έ 말투둜 변경이 κ°€λŠ₯ν•˜λ©°, ν‚€μ›Œλ“œμ— λŒ€ν•΄ μ§ˆλ¬Έν•˜λŠ” 정도λ₯Ό 쑰절 κ°€λŠ₯ν•©λ‹ˆλ‹€.
μ–΄λ €μš΄ 단어가 λ§Žμ€ IT/κ³Όν•™ λΆ„μ•Ό λ˜λŠ” 금육 뢄야에 νŠΉν™”ν•΄ Fine-tuning λ˜μ—ˆμŠ΅λ‹ˆλ‹€.
Team Notion
μ‹œμ—° μ˜μƒ

μ„œλΉ„μŠ€ μ΄μš©ν•˜κΈ°

$ pip install -r requirements.txt
$ streamlit run streamlit/malang_news.py
  1. μ›ν•˜λŠ” λ‰΄μŠ€μ˜ URL μž…λ ₯ (넀이버 λ‰΄μŠ€μ— μ΅œμ ν™”)
  2. Inference 기닀리고 κ²°κ³Ό 받아보기

μ£Όμ˜μ‚¬ν•­

  • malang_news.py에 본인의 Huggingface API Key, OpenAI API Keyλ₯Ό μž…λ ₯ν•΄μ•Ό ν•©λ‹ˆλ‹€.
API_TOKEN = "hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # Huggingface
API_KEY = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" # OpenAI
  • 아직 λͺ¨λΈμ„ λΆˆλŸ¬μ˜€λŠ” μ€‘μ΄μ—μš”. μ•ˆλ‚΄ 문ꡬ 좜λ ₯ μ‹œ 쑰금 λ’€ λ‹€μ‹œ μ‹œλ„
  • λ‰΄μŠ€λ₯Ό 찾을 수 μ—†μ–΄μš”. μ•ˆλ‚΄ 문ꡬ 좜λ ₯ μ‹œ URL이 μ˜¬λ°”λ₯Έμ§€ 확인

ν”„λ‘œμ νŠΈ ꡬ쑰

Malang_news/
β”‚
β”œβ”€β”€ crawler/
|   β”œβ”€β”€ headline_crawler_final.py
|   β”œβ”€β”€ headline_crawler_onlybs.py
|   β”œβ”€β”€ newneek_crawler.ipynb
|   β”œβ”€β”€ news_crawler_final.py
|   └── λ„€μ΄λ²„λ‰΄μŠ€_크둀링.ipynb
β”‚
β”œβ”€β”€ model/
β”‚   β”œβ”€β”€ BART/
|   |   β”œβ”€β”€ KoBART_navernews.ipynb
|   |   β”œβ”€β”€ μƒμ„±μš”μ•½_KoBART.ipynb
|   |   └── μΆ”μΆœμš”μ•½_KoBART.ipynb
|   | 
β”‚   β”œβ”€β”€ KeyBERT/
|   |   └── keyword_extract.ipynb
β”‚   |
|   └── causalLM/
|       β”œβ”€β”€ GPTtrain.py
|       └── koalpaca_fine-tuning.ipynb
|
β”œβ”€β”€ preprocessing/
|   β”œβ”€β”€ json2csv.ipynb
|   β”œβ”€β”€ newneek_preprocessing.ipynb
|   └── news_preprocessing_labeling.ipynb
|  
└── streamlit/
    β”œβ”€β”€ malang_news.py
    └── utils.py

Dataset

  • λ„€μ΄λ²„λ‰΄μŠ€ - 금육
  • λ„€μ΄λ²„λ‰΄μŠ€ IT/κ³Όν•™ ν—€λ“œλΌμΈ λ‰΄μŠ€
  • Korean SmileStyle Dataset

Model

λ¬Έμ„œ μš”μ•½

ν‚€μ›Œλ“œ μΆ”μΆœ

ν‚€μ›Œλ“œ μ§ˆλ‹΅

말투 λ³€ν™˜

μ‚¬μš© 기술 μŠ€νƒ