Todo Steps:
- links
    - get groq to generate N interesting topics 
    - generate wikipedia urls 
- get wiki page content data
- save as txt files

### Generate Wiki Links via Groq

In [1]:
import os
api_key = os.getenv('GROQ_API_KEY')

In [2]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_groq import ChatGroq

In [10]:
chat = ChatGroq(temperature=0, groq_api_key=api_key, model_name="llama-3.3-70b-versatile")

In [30]:
prompt = """
Generate a list of 10 wikipedia urls for interesting topics related to science and modern technology.
Topics should have short page content.
Output format: 
<list>
[
  "https://en.wikipedia.org/wiki/[topic-1]",
  "https://en.wikipedia.org/wiki/[topic-2]",
...
]
</list>
"""

response = chat.invoke(prompt)
print(response.content)

<list>
[
  "https://en.wikipedia.org/wiki/FAST_(neutron_reactor)",
  "https://en.wikipedia.org/wiki/Quantum_entanglement_swapping",
  "https://en.wikipedia.org/wiki/Memristor",
  "https://en.wikipedia.org/wiki/Nanocellulose",
  "https://en.wikipedia.org/wiki/Graphene_nanosheet",
  "https://en.wikipedia.org/wiki/Synthetic_biology",
  "https://en.wikipedia.org/wiki/Neuromorphic_engineering",
  "https://en.wikipedia.org/wiki/Spintronics",
  "https://en.wikipedia.org/wiki/Topological_quantum_computer",
  "https://en.wikipedia.org/wiki/Metamaterial"
]
</list>


In [31]:
urls = response.content.strip("<list>").strip("</list>").strip()
type(urls)

str

In [32]:
urls = eval(urls)
type(urls)

list

In [33]:
urls

['https://en.wikipedia.org/wiki/FAST_(neutron_reactor)',
 'https://en.wikipedia.org/wiki/Quantum_entanglement_swapping',
 'https://en.wikipedia.org/wiki/Memristor',
 'https://en.wikipedia.org/wiki/Nanocellulose',
 'https://en.wikipedia.org/wiki/Graphene_nanosheet',
 'https://en.wikipedia.org/wiki/Synthetic_biology',
 'https://en.wikipedia.org/wiki/Neuromorphic_engineering',
 'https://en.wikipedia.org/wiki/Spintronics',
 'https://en.wikipedia.org/wiki/Topological_quantum_computer',
 'https://en.wikipedia.org/wiki/Metamaterial']

### Get URL Content

In [21]:
from langchain_community.document_loaders import UnstructuredURLLoader

In [34]:
loader = UnstructuredURLLoader(urls=urls)
data = loader.load()

In [41]:
print(data[1].page_content)

Entanglement swapping

Add links

From Wikipedia, the free encyclopedia

(Redirected from Quantum entanglement swapping)

Quantum mechanics idea

In quantum mechanics, entanglement swapping is a protocol to transfer quantum entanglement from one pair of particles to another, even if the second pair of particles have never interacted. This process may have application in quantum communication networks and quantum computing.

Concept

[edit]

Basic principles

[edit]

Entanglement swapping has two pairs of entangled particles: (A, B) and (C, D). Pair of particles (A, B) is initially entangled, as is the pair (C, D). The pair (B, C) taken from the original pairs, is projected onto one of the four possible Bell states, a process called a Bell state measurement. The unmeasured pair of particles (A, D) can become entangled. This effect happens without any previous direct interaction between particles A and D.[2][3]

Entanglement swapping is a form of quantum teleportation. In quantum telepor

### Save as .txt

In [43]:
type(data[0].page_content)

str

In [54]:
doc_word_count = [len(doc.page_content.split(" ")) for doc in data]
doc_word_count

[215, 1205, 12877, 7743, 211, 18133, 4688, 3108, 3549, 9932]

In [59]:
# threshold each document to 4000 words
smaller_docs = [" ".join(doc.page_content.split(" ")[:3000]) for doc in data]
small_doc_word_count = [len(doc.split(" ")) for doc in smaller_docs]
small_doc_word_count

[215, 1205, 3000, 3000, 211, 3000, 3000, 3000, 3000, 3000]

In [60]:
file_names = [url.split("https://en.wikipedia.org/wiki/")[1] for url in urls]

In [63]:
# create directory
dir_name = "data"
os.makedirs(dir_name, exist_ok=True)

# save each string as a .txt file
for i, text in enumerate(smaller_docs):
    with open(f"{dir_name}/{file_names[i]}.txt", "w") as f:
        f.write(text)