In [2]:
%run "../01-check_setup.ipynb"

SAP HANA Client for Python: 2.23.25021400
Connected to SAP HANA db version 4.00.000.00.1739875052 (fa/CE2024.40) 
at c5889dd5-e0f6-4930-8408-94d53ca61dbf.hna0.prod-us10.hanacloud.ondemand.com:443 as CODEJAMHANAAI00
Current time on the SAP HANA server: 2025-02-26 21:46:03.835000


In [3]:
import requests, os

os.environ["USER_AGENT"] = f"{requests.get('https://httpbin.org/user-agent').json()['user-agent']}"
os.environ["USER_AGENT"]

'python-requests/2.32.3'

In [4]:
import zipfile
import requests
from io import BytesIO

# URL of the zipped file
zip_url = "https://docs.python.org/3/archives/python-3.13-docs-text.zip"

# Download the zipped file
response = requests.get(zip_url)
zip_file = BytesIO(response.content)


In [5]:

# Load documents from the zipped file
documents_from_zip = []
with zipfile.ZipFile(zip_file, 'r') as z:
    for file_name in z.namelist():
        if not z.getinfo(file_name).is_dir():
            with z.open(file_name) as f:
                documents_from_zip.append({'metadata': {'file_name': file_name}, 'content': f.read().decode('utf-8')})

# Check the number of documents loaded
print(f"Number of documents loaded: {len(documents_from_zip)}")

Number of documents loaded: 507


In [6]:
## Select only records where file_name contains '/faq/'
faq_documents = [doc for doc in documents_from_zip if '/faq/' in doc['metadata']['file_name']]  
print(f"Number of FAQ documents: {len(faq_documents)}")

Number of FAQ documents: 9


In [15]:
print(faq_documents[2]['content'])

Library and Extension FAQ
*************************


General Library Questions


How do I find a module or application to perform task X?
--------------------------------------------------------

Check the Library Reference to see if there's a relevant standard
library module.  (Eventually you'll learn what's in the standard
library and will be able to skip this step.)

For third-party packages, search the Python Package Index or try
Google or another web search engine.  Searching for "Python" plus a
keyword or two for your topic of interest will usually find something
helpful.


Where is the math.py (socket.py, regex.py, etc.) source file?
-------------------------------------------------------------

If you can't find a source file for a module it may be a built-in or
dynamically loaded module implemented in C, C++ or other compiled
language. In this case you may not have the source file or it may be
something like "mathmodule.c", somewhere in a C source directory (not
on the Python

In [8]:
import pandas as pd

## Convert faq_documents to a DataFrame `df_faq`
df_faq = pd.DataFrame(faq_documents)


In [17]:
import markdown
from IPython.display import HTML

# Sample Markdown text
markdown_text = "# Header\n\nThis is a **bold** text and this is *italic*."

# Convert Markdown to HTML
html = markdown.markdown(df_faq['content'][0])

display((html))


'<p>"Why is Python Installed on my Computer?" FAQ</p>\n<hr />\n<h1>What is Python?</h1>\n<p>Python is a programming language.  It\'s used for many different\napplications. It\'s used in some high schools and colleges as an\nintroductory programming language because Python is easy to learn, but\nit\'s also used by professional software developers at places such as\nGoogle, NASA, and Lucasfilm Ltd.</p>\n<p>If you wish to learn more about Python, start with the Beginner\'s\nGuide to Python.</p>\n<h1>Why is Python installed on my machine?</h1>\n<p>If you find Python installed on your system but don\'t remember\ninstalling it, there are several possible ways it could have gotten\nthere.</p>\n<ul>\n<li>\n<p>Perhaps another user on the computer wanted to learn programming and\n  installed it; you\'ll have to figure out who\'s been using the machine\n  and might have installed it.</p>\n</li>\n<li>\n<p>A third-party application installed on the machine might have been\n  written in Python and i

In [18]:
# Convert column metadata of df_faq to string
df_faq['metadata'] = df_faq['metadata'].astype(str)
df_faq['content_html'] = df_faq['content'].apply(markdown.markdown)

In [19]:
display(df_faq.dtypes)
df_faq=df_faq.convert_dtypes()
display(df_faq.dtypes)

metadata        object
content         object
content_html    object
dtype: object

metadata        string[python]
content         string[python]
content_html    string[python]
dtype: object

In [20]:
hana_table_name="FAQ_HTML"

In [21]:
## Create a table in SAP HANA from `df_faq`
from hana_ml.dataframe import create_dataframe_from_pandas
hdf_faq_bronze = create_dataframe_from_pandas(connection_context=myconn, 
                                              pandas_df=df_faq, 
                                              table_name=hana_table_name, 
                                              force=True,
                                              object_type_as_bin=True,
                                              table_structure={'metadata': 'NVARCHAR(5000)' ,
                                                               'content': 'NCLOB', 'content_html': 'NCLOB'}
                                              )

100%|██████████| 1/1 [00:00<00:00,  1.63it/s]


In [25]:
import pandas as pd
pd.set_option('max_colwidth', 256) 

hdf_faq_bronze.head().collect().T

Unnamed: 0,0
metadata,{'file_name': 'python-3.13-docs-text/faq/installed.txt'}
content,"""Why is Python Installed on my Computer?"" FAQ\n*********************************************\n\n\nWhat is Python?\n===============\n\nPython is a programming language. It's used for many different\napplications. It's used in some high schools and coll..."
content_html,"<p>""Why is Python Installed on my Computer?"" FAQ</p>\n<hr />\n<h1>What is Python?</h1>\n<p>Python is a programming language. It's used for many different\napplications. It's used in some high schools and colleges as an\nintroductory programming langua..."


## Text splitting, preparing FAQs for vectorization

In [26]:
hdf_faq_bronze.add_id().head().collect().T

Unnamed: 0,0
ID,1
metadata,{'file_name': 'python-3.13-docs-text/faq/installed.txt'}
content,"""Why is Python Installed on my Computer?"" FAQ\n*********************************************\n\n\nWhat is Python?\n===============\n\nPython is a programming language. It's used for many different\napplications. It's used in some high schools and coll..."
content_html,"<p>""Why is Python Installed on my Computer?"" FAQ</p>\n<hr />\n<h1>What is Python?</h1>\n<p>Python is a programming language. It's used for many different\napplications. It's used in some high schools and colleges as an\nintroductory programming langua..."


In [32]:
# Applying the Text Splitter with recursive-splitting, available with hana-ml 2.23
from hana_ml.text.text_splitter import TextSplitter

splitter = TextSplitter(split_type='recursive', doc_type='html')
splitter._extend_pal_parameter({'GLOBAL_SEPARATOR':'[<h]', 'KEEP_SEPARATOR':1})
hdf_faq_silver = splitter.split_text(
    hdf_faq_bronze.add_id().select('ID', 'content_html'), 
    order_status=True
    )

In [52]:
# Applying the Text Splitter with recursive-splitting, available with hana-ml 2.23
from hana_ml.text.text_splitter import TextSplitter

splitter = TextSplitter(split_type='document', doc_type='html', keep_separator=True, overlap=4)
splitter._extend_pal_parameter({'GLOBAL_SEPARATOR':'[<h1>,<h2>,<h3>,<h4>,<h5>,<h6>]'})
hdf_faq_silver = splitter.split_text(
    hdf_faq_bronze.add_id().select('ID', 'content_html'), 
    order_status=True
    )

In [53]:
import pandas as pd
pd.set_option('max_colwidth', None) 

print(hdf_faq_silver.shape)
display(splitter.statistics_.collect())
display(hdf_faq_silver.select("*", ('LENGTH("CONTENT")', "CHUNCK_SIZE")).head(10).collect())

[205, 3]


Unnamed: 0,STAT_NAME,STAT_VALUE
0,GLOBAL_SEPARATOR_LIST,"{""Separator"":""[<h1>,<h2>,<h3>,<h4>,<h5>,<h6>]""}"


Unnamed: 0,ID,SUB_ID,CONTENT,CHUNCK_SIZE
0,1,0,"<p>""Why is Python Installed on my Computer?"" FAQ</p>\n<hr />\n<h1>",64
1,1,1,"What is Python?</h1>\n<p>Python is a programming language. It's used for many different\napplications. It's used in some high schools and colleges as an\nintroductory programming language because Python is easy to learn, but\nit's also used by professional software developers at places such as\nGoogle, NASA, and Lucasfilm Ltd.</p>\n<p>If you wish to learn more about Python, start with the Beginner's\nGuide to Python.</p>\n<h1>",423
2,1,2,"Why is Python installed on my machine?</h1>\n<p>If you find Python installed on your system but don't remember\ninstalling it, there are several possible ways it could have gotten\nthere.</p>\n<ul>\n<li>\n<p>Perhaps another user on the computer wanted to learn programming and\n installed it; you'll have to figure out who's been using the machine\n and might have installed it.</p>\n</li>\n<li>\n<p>A third-party application installed on the machine might have been\n written in Python and included a Python installation. There are\n many such applications, from GUI programs to network servers and\n administrative scripts.</p>\n</li>\n<li>\n<p>Some Windows machines also have Python installed. At this writing\n we're aware of computers from Hewlett-Packard and Compaq that\n include Python. Apparently some of HP/Compaq's administrative tools\n are written in Python.</p>\n</li>\n<li>\n<p>Many Unix-compatible operating systems, such as macOS and some Linux\n distributions, have Python installed by default; it's included in\n the base installation.</p>\n</li>\n</ul>\n<h1>",1063
3,1,3,"Can I delete Python?</h1>\n<p>That depends on where Python came from.</p>\n<p>If someone installed it deliberately, you can remove it without\nhurting anything. On Windows, use the Add/Remove Programs icon in the\nControl Panel.</p>\n<p>If Python was installed by a third-party application, you can also\nremove it, but that application will no longer work. You should use\nthat application's uninstaller rather than removing Python directly.</p>\n<p>If Python came with your operating system, removing it is not\nrecommended. If you remove it, whatever tools were written in Python\nwill no longer run, and some of them might be important to you.\nReinstalling the whole system would then be required to fix things\nagain.</p>",718
4,2,4,<p>Python on Windows FAQ</p>\n<hr />\n<h1>,40
5,2,5,"How do I run a Python program under Windows?</h1>\n<p>This is not necessarily a straightforward question. If you are already\nfamiliar with running programs from the Windows command line then\neverything will seem obvious; otherwise, you might need a little more\nguidance.</p>\n<p>Unless you use some sort of integrated development environment, you\nwill end up <em>typing</em> Windows commands into what is referred to as a\n""Command prompt window"". Usually you can create such a window from\nyour search bar by searching for ""cmd"". You should be able to\nrecognize when you have started such a window because you will see a\nWindows ""command prompt"", which usually looks like this:</p>\n<p>C:&gt;</p>\n<p>The letter may be different, and there might be other things after it,\nso you might just as easily see something like:</p>\n<p>D:\YourName\Projects\Python&gt;</p>\n<p>depending on how your computer has been set up and what else you have\nrecently done with it. Once you have started such a window, you are\nwell on the way to running Python programs.</p>\n<p>You need to realize that your Python scripts have to be processed by\nanother program called the Python <em>interpreter</em>. The interpreter\nreads your script, compiles it into bytecodes, and then executes the\nbytecodes to run your program. So, how do you arrange for the\ninterpreter to handle your Python?</p>\n<p>First, you need to make sure that your command window recognises the\nword ""py"" as an instruction to start the interpreter. If you have\nopened a command window, you should try entering the command ""py"" and\nhitting return:</p>\n<p>C:\Users\YourName&gt; py</p>\n<p>You should then see something like:</p>\n<p>Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:04:45) [MSC v.1900 32 bit (Intel)] on win32\n Type ""help"", ""copyright"", ""credits"" or ""license"" for more information.</p>\n<blockquote>\n<blockquote>\n<blockquote></blockquote>\n</blockquote>\n</blockquote>\n<p>You have started the interpreter in ""interactive mode"". That means you\ncan enter Python statements or expressions interactively and have them\nexecuted or evaluated while you wait. This is one of Python's\nstrongest features. Check it by entering a few expressions of your\nchoice and seeing the results:</p>\n<blockquote>\n<blockquote>\n<blockquote>\n<p>print(""Hello"")\n Hello\n""Hello"" * 3\n 'HelloHelloHello'</p>\n</blockquote>\n</blockquote>\n</blockquote>\n<p>Many people use the interactive mode as a convenient yet highly\nprogrammable calculator. When you want to end your interactive Python\nsession, call the ""exit()"" function or hold the ""Ctrl"" key down while\nyou enter a ""Z"", then hit the """"Enter"""" key to get back to your\nWindows command prompt.</p>\n<p>You may also find that you have a Start-menu entry such as Start ‣\nPrograms ‣ Python 3.x ‣ Python (command line) that results in you\nseeing the ""&gt;&gt;&gt;"" prompt in a new window. If so, the window will\ndisappear after you call the ""exit()"" function or enter the ""Ctrl""-""Z""\ncharacter; Windows is running a single ""python"" command in the window,\nand closes it when you terminate the interpreter.</p>\n<p>Now that we know the ""py"" command is recognized, you can give your\nPython script to it. You'll have to give either an absolute or a\nrelative path to the Python script. Let's say your Python script is\nlocated in your desktop and is named ""hello.py"", and your command\nprompt is nicely opened in your home directory so you're seeing\nsomething similar to:</p>\n<p>C:\Users\YourName&gt;</p>\n<p>So now you'll ask the ""py"" command to give your script to Python by\ntyping ""py"" followed by your script path:</p>\n<p>C:\Users\YourName&gt; py Desktop\hello.py\n hello</p>\n<h1>",3649
6,2,6,"How do I make Python scripts executable?</h1>\n<p>On Windows, the standard Python installer already associates the .py\nextension with a file type (Python.File) and gives that file type an\nopen command that runs the interpreter (""D:\Program\nFiles\Python\python.exe ""%1"" %*""). This is enough to make scripts\nexecutable from the command prompt as 'foo.py'. If you'd rather be\nable to execute the script by simple typing 'foo' with no extension\nyou need to add .py to the PATHEXT environment variable.</p>\n<h1>",507
7,2,7,"Why does Python sometimes take so long to start?</h1>\n<p>Usually Python starts very quickly on Windows, but occasionally there\nare bug reports that Python suddenly begins to take a long time to\nstart up. This is made even more puzzling because Python will work\nfine on other Windows systems which appear to be configured\nidentically.</p>\n<p>The problem may be caused by a misconfiguration of virus checking\nsoftware on the problem machine. Some virus scanners have been known\nto introduce startup overhead of two orders of magnitude when the\nscanner is configured to monitor all reads from the filesystem. Try\nchecking the configuration of virus scanning software on your systems\nto ensure that they are indeed configured identically. McAfee, when\nconfigured to scan all file system read activity, is a particular\noffender.</p>\n<h1>",835
8,2,8,How do I make an executable from a Python script?</h1>\n<p>See How can I create a stand-alone binary from a Python script? for a\nlist of tools that can be used to make executables.</p>\n<h1>,188
9,2,9,"Is a ""*.pyd"" file the same as a DLL?</h1>\n<p>Yes, .pyd files are dll's, but there are a few differences. If you\nhave a DLL named ""foo.pyd"", then it must have a function\n""PyInit_foo()"". You can then write Python ""import foo"", and Python\nwill search for foo.pyd (as well as foo.py, foo.pyc) and if it finds\nit, will attempt to call ""PyInit_foo()"" to initialize it. You do not\nlink your .exe with foo.lib, as that would cause Windows to require\nthe DLL to be present.</p>\n<p>Note that the search path for foo.pyd is PYTHONPATH, not the same as\nthe path that Windows uses to search for foo.dll. Also, foo.pyd need\nnot be present to run your program, whereas if you linked your program\nwith a dll, the dll is required. Of course, foo.pyd is required if\nyou want to say ""import foo"". In a DLL, linkage is declared in the\nsource code with ""__declspec(dllexport)"". In a .pyd, linkage is\ndefined in a list of available functions.</p>\n<h1>",935


## Generating Text Embeddings in SAP HANA Cloud

In [56]:
content_column = 'CONTENT'

In [55]:
print(f"""Number of records selected for further processing: {hdf_faq_silver.count()}""")

Number of records selected for further processing: 205


In [60]:
hdf_faq_silver.get_table_structure()

{'ID': 'INT', 'SUB_ID': 'INT', 'CONTENT': 'NCLOB'}

In [61]:
### Generating Text Embeddings in SAP HANA Cloud with the new PAL function, function available with hana-ml 2.23.
from hana_ml.text.pal_embeddings import PALEmbeddings
pe = PALEmbeddings()
hdf_faq_gold = pe.fit_transform(hdf_faq_silver, key="SUB_ID", target=[f"{content_column}"], thread_number=10, batch_size=10) #, max_token_num=512
print(f"{hdf_faq_gold.count()} records processed in {round(pe.runtime, 3)} sec")

205 records processed in 10.828 sec


In [62]:
hdf_faq_gold.get_table_structure()

{'ID': 'INT',
 'SUB_ID': 'INT',
 'CONTENT': 'NCLOB',
 'VECTOR_COL_CONTENT': 'REAL_VECTOR'}

In [64]:
hdf_faq_gold.head(2).collect()

Unnamed: 0,ID,SUB_ID,CONTENT,VECTOR_COL_CONTENT
0,1,0,"<p>""Why is Python Installed on my Computer?"" FAQ</p>\n<hr />\n<h1>","[0.033143430948257446, 0.030311258509755135, -0.006247238721698523, -0.016939878463745117, 0.045132335275411606, 0.004507202189415693, 0.0017255013808608055, 0.03091788850724697, -0.00022198859369382262, -0.03345530852675438, -0.02756786160171032, 0.018464984372258186, -0.07268993556499481, 0.019762547686696053, -0.003106691874563694, 0.03331155702471733, 0.07276413589715958, -0.01611221954226494, -0.004753657151013613, 0.04678535461425781, -0.03055003471672535, 0.04742305353283882, -0.06467708200216293, 0.007455017417669296, -0.06081252545118332, 0.004512310028076172, -0.0006425511674024165, 0.007020565681159496, -0.057269200682640076, -0.05362165346741676, 0.051400188356637955, 0.014546925202012062, 0.0010902599897235632, -0.036813925951719284, 0.02332942560315132, -0.0015924132894724607, -0.031235583126544952, 0.03840479254722595, -0.03235818073153496, 0.012237551622092724, 3.0626745228801155e-06, -0.01173030398786068, -0.01262391172349453, 0.043833743780851364, -0.059163898229599, 0.02255098894238472, -0.010658904910087585, 0.038852568715810776, 0.003964002709835768, 0.011811654083430767, -0.023250121623277664, 0.04976923018693924, -0.015482660382986069, -0.0678335577249527, 0.025107253342866898, 0.029572248458862305, 0.00831320509314537, -0.05473675951361656, -0.013811982236802578, -0.006573001854121685, 0.03987565264105797, 0.01776598021388054, -0.027233997359871864, 0.012870930135250092, 0.018448250368237495, -0.03847121819853783, 0.0010136663913726807, 0.06266427785158157, -0.029052501544356346, 0.023145483806729317, 0.021631749346852303, 0.038326237350702286, -0.06536207348108292, -0.027366865426301956, 0.03444402664899826, -0.004832293838262558, -0.016421547159552574, 0.05291067063808441, 0.030835695564746857, 0.04059726744890213, 0.020052867010235786, 0.03937257453799248, 0.037940021604299545, 0.05398465320467949, 0.027393177151679993, -0.014652387239038944, -0.04105612635612488, -0.0009165100054815412, -0.0017121442360803485, 0.04155111312866211, -0.0247059129178524, -0.03783412650227547, 0.04741296544671059, 0.0038822987116873264, -0.007127377670258284, -0.03903530538082123, 0.0031897309236228466, 0.006337149068713188, 0.02823573537170887, 0.00013904759543947875, ...]"
1,1,1,"What is Python?</h1>\n<p>Python is a programming language. It's used for many different\napplications. It's used in some high schools and colleges as an\nintroductory programming language because Python is easy to learn, but\nit's also used by professional software developers at places such as\nGoogle, NASA, and Lucasfilm Ltd.</p>\n<p>If you wish to learn more about Python, start with the Beginner's\nGuide to Python.</p>\n<h1>","[0.03812775760889053, 0.03766943886876106, -0.003967163152992725, -0.0042003849521279335, 0.022935397922992706, 0.006581943016499281, 0.00827502366155386, 0.07575546205043793, -0.035610876977443695, -0.026387322694063187, -0.05020380765199661, -0.0034398278221488, -0.06700544059276581, 0.03167089447379112, -0.005307564977556467, 0.022569259628653526, 0.10008034855127335, -0.010286636650562286, 0.011069213971495628, 0.03883926197886467, -0.05287667363882065, 0.015732107684016228, -0.02826469950377941, 0.0283032339066267, -0.01034550741314888, 0.014261618256568909, -0.022535227239131927, 0.011489572934806347, -0.05141935497522354, -0.03669305518269539, 0.016113653779029846, 0.010136461816728115, -0.012747722677886486, -0.0199532862752676, 0.011676655150949955, -0.022923801094293594, 0.00922334287315607, 0.01621725782752037, -0.020995136350393295, 0.008242526091635227, 0.013580135069787502, -0.039491601288318634, -0.03111039102077484, -0.018279066309332848, -0.058238085359334946, 0.007727416232228279, -0.004431350156664848, 0.04834586754441261, -0.02156693860888481, 0.028713161125779152, -0.031465914100408554, 0.01684660278260708, 0.008428304456174374, -0.05439024418592453, 0.003946194890886545, 0.03316352143883705, 0.04028981178998947, -0.05887116864323616, -0.07436404377222061, -0.010170637629926205, 0.07032755762338638, 0.05520424246788025, -0.002896175952628255, 0.023355644196271896, 0.040819235146045685, -0.02434650994837284, 0.0020262887701392174, 0.05515427142381668, -0.05029863119125366, 0.003002970013767481, -0.011816968210041523, 0.01944344863295555, -0.09320258349180222, -0.03673619404435158, 0.005701628513634205, 0.01667577587068081, -0.00284001138061285, 0.07346253842115402, 0.044040244072675705, 0.03383155167102814, 0.020258793607354164, 0.06361281871795654, -0.009843576699495316, 0.0434802807867527, 0.014793848618865013, -0.013666527345776558, -0.047014929354190826, -0.01191718690097332, 0.0029061464592814445, 0.012642161920666695, -0.02376134693622589, -0.06065330281853676, 0.03872058913111687, 0.011114862747490406, 0.027162477374076843, -0.027447674423456192, 0.005610770545899868, -0.04238266870379448, -0.0020114330109208822, 0.01918358914554119, ...]"


In [65]:
hdf_faq_gold.select_statement

'SELECT T0."ID", T0."SUB_ID", T0."CONTENT", T1."VECTOR_COL_CONTENT"\nFROM (SELECT * FROM "#PAL_CHUNKING_RESULT_TBL_1E1607FC_F491_11EF_BDE1_9A1A5F2997FD") T0 INNER JOIN (SELECT "SUB_ID", "VECTOR_COL" AS "VECTOR_COL_CONTENT" FROM (SELECT "SUB_ID", "VECTOR_COL" FROM (SELECT * FROM "#PAL_EMBEDDINGS_RESULT_TBL_0_293DED88_F492_11EF_BDE1_9A1A5F2997FD") AS "DT_123") AS "DT_127") T1\n ON T0."SUB_ID" = T1."SUB_ID"\n'

In [66]:
hdf_faq_gold=hdf_faq_gold.save(where="#FAQ_EMBEDDINGS", force=True)

## Semantic search in FAQ

In [105]:
prompt="How to learn Python?"

In [106]:
df_result = myconn.sql(
    f"""SELECT TOP 5
    COSINE_SIMILARITY(VECTOR_EMBEDDING('{prompt}', 'QUERY', 'SAP_NEB.20240715'), "VECTOR_COL_{content_column}") AS "SIMILARITY",
    "ID", "{content_column}"
    FROM ({hdf_faq_gold.select_statement})
    ORDER BY 1 DESC;
    """
).collect()

In [107]:
df_result.head(5)

Unnamed: 0,SIMILARITY,ID,CONTENT
0,0.794223,9,"I've never programmed before. Is there a Python tutorial?</h2>\n<p>There are numerous tutorials and books available. The standard\ndocumentation includes The Python Tutorial.</p>\n<p>Consult the Beginner's Guide to find information for beginning Python\nprogrammers, including lists of tutorials.</p>\n<h2>"
1,0.736561,9,"What is Python?</h2>\n<p>Python is an interpreted, interactive, object-oriented programming\nlanguage. It incorporates modules, exceptions, dynamic typing, very\nhigh level dynamic data types, and classes. It supports multiple\nprogramming paradigms beyond object-oriented programming, such as\nprocedural and functional programming. Python combines remarkable\npower with very clear syntax. It has interfaces to many system calls\nand libraries, as well as to various window systems, and is extensible\nin C or C++. It is also usable as an extension language for\napplications that need a programmable interface. Finally, Python is\nportable: it runs on many Unix variants including Linux and macOS, and\non Windows.</p>\n<p>To find out more, start with The Python Tutorial. The Beginner's\nGuide to Python links to other introductory tutorials and resources\nfor learning Python.</p>\n<h2>"
2,0.724282,1,"What is Python?</h1>\n<p>Python is a programming language. It's used for many different\napplications. It's used in some high schools and colleges as an\nintroductory programming language because Python is easy to learn, but\nit's also used by professional software developers at places such as\nGoogle, NASA, and Lucasfilm Ltd.</p>\n<p>If you wish to learn more about Python, start with the Beginner's\nGuide to Python.</p>\n<h1>"
3,0.711393,9,"Is Python a good language for beginning programmers?</h2>\n<p>Yes.</p>\n<p>It is still common to start students with a procedural and statically\ntyped language such as Pascal, C, or a subset of C++ or Java.\nStudents may be better served by learning Python as their first\nlanguage. Python has a very simple and consistent syntax and a large\nstandard library and, most importantly, using Python in a beginning\nprogramming course lets students concentrate on important programming\nskills such as problem decomposition and data type design. With\nPython, students can be quickly introduced to basic concepts such as\nloops and procedures. They can probably even work with user-defined\nobjects in their very first course.</p>\n<p>For a student who has never programmed before, using a statically\ntyped language seems unnatural. It presents additional complexity\nthat the student must master and slows the pace of the course. The\nstudents are trying to learn to think like a computer, decompose\nproblems, design consistent interfaces, and encapsulate data. While\nlearning to use a statically typed language is important in the long\nterm, it is not necessarily the best topic to address in the students'\nfirst programming course.</p>\n<p>Many other aspects of Python make it a good first language. Like\nJava, Python has a large standard library so that students can be\nassigned programming projects very early in the course that <em>do</em>\nsomething. Assignments aren't restricted to the standard four-\nfunction calculator and check balancing programs. By using the\nstandard library, students can gain the satisfaction of working on\nrealistic applications as they learn the fundamentals of programming.\nUsing the standard library also teaches students about code reuse.\nThird-party modules such as PyGame are also helpful in extending the\nstudents' reach.</p>\n<p>Python's interactive interpreter enables students to test language\nfeatures while they're programming. They can keep a window with the\ninterpreter running while they enter their program's source in another\nwindow. If they can't remember the methods for a list, they can do\nsomething like this:</p>\n<blockquote>\n<blockquote>\n<blockquote>\n<p>L = []\ndir(L)\n ['<strong>add</strong>', '<strong>class</strong>', '<strong>contains</strong>', '<strong>delattr</strong>', '<strong>delitem</strong>',\n '<strong>dir</strong>', '<strong>doc</strong>', '<strong>eq</strong>', '<strong>format</strong>', '<strong>ge</strong>',\n '<strong>getattribute</strong>', '<strong>getitem</strong>', '<strong>gt</strong>', '<strong>hash</strong>', '<strong>iadd</strong>',\n '<strong>imul</strong>', '<strong>init</strong>', '<strong>iter</strong>', '<strong>le</strong>', '<strong>len</strong>', '<strong>lt</strong>',\n '<strong>mul</strong>', '<strong>ne</strong>', '<strong>new</strong>', '<strong>reduce</strong>', '<strong>reduce_ex</strong>',\n '<strong>repr</strong>', '<strong>reversed</strong>', '<strong>rmul</strong>', '<strong>setattr</strong>', '<strong>setitem</strong>',\n '<strong>sizeof</strong>', '<strong>str</strong>', '<strong>subclasshook</strong>', 'append', 'clear',\n 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove',\n 'reverse', 'sort']\n[d for d in dir(L) if '__' not in d]\n ['append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']</p>\n<p>help(L.append)\n Help on built-in function append:</p>\n</blockquote>\n</blockquote>\n</blockquote>\n<p>append(...)\n L.append(object) -&gt; None -- append object to end</p>\n<blockquote>\n<blockquote>\n<blockquote>\n<p>L.append(1)\nL\n [1]</p>\n</blockquote>\n</blockquote>\n</blockquote>\n<p>With the interpreter, documentation is never far from the student as\nthey are programming.</p>\n<p>There are also good IDEs for Python. IDLE is a cross-platform IDE for\nPython that is written in Python using Tkinter. Emacs users will be\nhappy to know that there is a very good Python mode for Emacs. All of\nthese programming environments provide syntax highlighting, auto-\nindenting, and access to the interactive interpreter while coding.\nConsult the Python wiki for a full list of Python editing\nenvironments.</p>\n<p>If you want to discuss Python's use in education, you may be\ninterested in joining the edu-sig mailing list.</p>"
4,0.710006,1,"<p>""Why is Python Installed on my Computer?"" FAQ</p>\n<hr />\n<h1>"


In [102]:
# Print the rows of the 'content' column
print(df_result['CONTENT'][0])

Why was Python created in the first place?</h2>
<p>Here's a <em>very</em> brief summary of what started it all, written by Guido
van Rossum:</p>
<p>I had extensive experience with implementing an interpreted
   language in the ABC group at CWI, and from working with this group
   I had learned a lot about language design.  This is the origin of
   many Python features, including the use of indentation for
   statement grouping and the inclusion of very-high-level data types
   (although the details are all different in Python).</p>
<p>I had a number of gripes about the ABC language, but also liked
   many of its features.  It was impossible to extend the ABC language
   (or its implementation) to remedy my complaints -- in fact its lack
   of extensibility was one of its biggest problems.  I had some
   experience with using Modula-2+ and talked with the designers of
   Modula-3 and read the Modula-3 report. Modula-3 is the origin of
   the syntax and semantics used for exceptions, and

In [108]:
from IPython.display import HTML

# Convert the rows of the 'content' column to markdown format
display(HTML(df_result['CONTENT'][3]))