#✈️ Planning to Study in Switzerland? Just ask!

### Where is the data from?
- blog.naver.com/imyourbest (89% 직장인 일지)

### Why is it useful?
- 초코빵 finished her master's in Zurich(2020-2025) and documented her full preparation journey on her Naver blog.
- So many people have asked her for tips — so she trained a little Q&A bot to answer your questions!

### Copyright
- All blog posts(data) on this Q&A project are written and created by 초코빵 herself.
- Unauthorized copying or reproduction of any part of this project is strictly prohibited without prior permission.

## Setup

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
import os
path = '/content/drive/MyDrive/NLP_AI_Study/Projects/01swiss_study_abroad_prep_QA'
os.chdir(path)

packages

In [8]:
!pip install -q python-dotenv==1.0.1 numpy==1.26.4 pandas==2.2.2
!pip install -qU llama-index llama-index-llms-openai llama-index-embeddings-openai

API key

In [5]:
from helper import get_openai_api_key
OPENAI_API_KEY = get_openai_api_key()

parallel processing

In [6]:
import nest_asyncio
nest_asyncio.apply()

## Prepare contents of blog postings

In [9]:
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader(input_files=["swiss_study_abroad_prep.pdf"]).load_data()

## Set up language and embedding models for answering study-abroad questions

In [10]:
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)

In [11]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")

## Set up two Q&A tools based on the type of question:
- summary_tool: when you want a summary
- vector_tool: when you have a specific question

In [12]:
from llama_index.core import SummaryIndex, VectorStoreIndex

summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)

In [13]:
summary_query_engine = summary_index.as_query_engine(
    response_mode="tree_summarize",
    use_async=True,
)
vector_query_engine = vector_index.as_query_engine()

In [14]:
from llama_index.core.tools import QueryEngineTool

summary_tool = QueryEngineTool.from_defaults(
    query_engine=summary_query_engine,
    description=(
        "When you want a summarization"
    ),
)

vector_tool = QueryEngineTool.from_defaults(
    query_engine=vector_query_engine,
    description=(
        "When you have a specific question"
    ),
)

## Automatically selects the appropriate Q&A tool based on the question

In [15]:
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_tools=[
        summary_tool,
        vector_tool,
    ],
    #verbose=True
)

## Q&A examples

In [16]:
response = query_engine.query("스위스 유학 준비 서류는?")
print(str(response))

학교 지원서, 학업 이력서, 성적 증명서, 추천서, 영어 시험 성적표, 학위증명서, 자기 소개서, 비자 서류, 합격증명서, 비자 신청에 필요한 서류, 재정증빙서류, 보험증서(영문), 여권사본, 숙소 예약 확인서, 국제학생증, 외국환은행 지정 등록서 등이 필요할 수 있습니다.


In [17]:
response = query_engine.query(
    "유학을 준비하면서 가장 힘들었던 건 뭐였어? 한국어로 대답해줘."
)
print(str(response))

취리히로의 유학 준비 과정에서 가장 힘들었던 것은 숙소를 구하는 과정이었습니다.


# Your turn! 🤖 Ask what you want!
🔎 답변이 영어로 나온다면 포스팅에서 찾지 못한 내용일 확률이 높으므로 hallucination 답변을 피할 수 있답니다! 👍🏻

In [25]:
user_question = input("무엇이 궁금한가요?🤗 : ")
response = query_engine.query(user_question)
print("답변🤖 :", str(response))

무엇이 궁금한가요?🤗 : 스위스 유학하면서 쓸 예산을 얼마 정도로 잡았어?
답변🤖 : 스위스 유학하면서 쓸 예산을 총 7천~8천만원 정도로 잡았다.
