# Extractive QA



1. Scraping Dell Website (Data Collection)
1. Cleaning dataset
1. QA Fromatting using Haystack Annotation tool (SQUAD format)
1. Tokenisation, DocumentStore (FAISS, InMemoryDocumentStore), Retriever (DPR), Reader (FARMReader)

# Scraping

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
import uuid
import os
from tqdm import tqdm

import time
from selenium import webdriver
# for headless
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

import json

In [3]:
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36'}



# url = 'https://www.dell.com/community/PowerEdge-Hardware-General/PowerEdge-R730XD-Loading-BIOS-Drivers/m-p/7644944#M65318'
url = 'https://www.dell.com/community/PowerEdge-Hardware-General/bd-p/PowerEdge-General-HW'
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')

print(soup.title.text)


	PowerEdge Hardware General - Dell Community



### Dell Website: Scraping for URLs

- For this project will exclusivley look at Hardware Dell suppport forums, we could later incorporate other fields such as Software issues and label entries as such. But for now, focusing on Hardware `https://www.dell.com/community/PowerEdge-Hardware-General/bd-p/PowerEdge-General-HW`.
- ![image.png](attachment:b29a65b0-b725-42c8-8cc4-db5fb2beea43.png)
- We want to land on the website and pull `URLs` of Solved Dell forum posts and save them as a list (search for HTML elements `//a[@href]`.
- As this internet network is based in **EU**, there is a `Accept Cookies` button that can pop up when the webpage is first openedthat needs to be clicked.
- Using the Selium library, will first filter by `Solved` cases in order to scrape forums that have a solution (using Chrome Developer tool helps identify the HTML element names):
- ![image.png](attachment:03b72c12-8747-41fe-9115-c2f878da3386.png)
- After Solved button will need to scroll down to `Load More`. Note a `MoveTargetOutOfBoundsException` error can occur when the scroll action goes beyond the bounds of the webpage. To avoid this error, we can try a different approach by scrolling to the bottom of the page using `execute_script()` and then clicking the "Load more" button. The parameter `num_clicks` which indicates the number of times to click the "Load more" button.

> One issue, as we click Load More, the website length increases until eventually we cant scroll down to the bottom fast enough in order to click Load More again. As we increase `num_clicks` to be `>90` we hit an error because we are not giving the function enough time, using `time.sleep()`, to scroll down the increasingly larger webpage `ElementClickInterceptedException`. To resolve this we scroll the element into view using the `execute_script` method. This ensures that the element is visible and clickable (`driver.execute_script("arguments[0].scrollIntoView();", load_more_button)
`). 



In [41]:
def automate_dell_forum(num_clicks):
    # Instantiate the Selenium web driver
    driver = webdriver.Chrome()
    driver.maximize_window()  # Maximize the browser window

    # Navigate to the Dell community forum page
    driver.get('https://www.dell.com/community/PowerEdge-Hardware-General/bd-p/PowerEdge-General-HW')

    # Press the "Accept All" button for cookies
    accept_button = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//a[@aria-label="allow cookies"]')))
    accept_button.click()
    
    # Select the "Solved" option from the dropdown
    select_element = driver.find_element(By.ID, 'messages-loader-type')
    option_solved = select_element.find_element(By.XPATH, "//option[@value='solved']")
    option_solved.click()

    # Wait for the page to load after selecting "Solved" option
    time.sleep(2)

    # Scroll to the bottom of the page
    print("scrolled outside loop")
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait for a few seconds BEFORE clicking the "Load more" button
    print("wait(2) outside before loop")
    time.sleep(2)
    

    # Click the "Load more" button the specified number of times
    count = 1
    
    for _ in range(num_clicks):
        load_more_button = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, 'btn-load-more')))
        print("Waiting... for a few seconds BEFORE clicking the Load more button")
        time.sleep(3)
        print("Wait befor done")
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        print("scrolled in loop at", count)
        time.sleep(3)
        print("waited 2 secs after scroll")
        load_more_button.click()
        count += 1
        print("clicked: ", count)
        print("Waiting... for a few seconds AFTER clicking the Load more button")
        time.sleep(6)
        print("Wait after done")
    
        
    
    # Wait for the page to load AFTER clicking "Load more" button
    #print("Waiting... for a few seconds AFTER clicking the Load more button")
    print("wait(2) outside after loop")

    time.sleep(2)
    
    #print("Wait after done")
    

    # Get all href urls on the page and save them to a list
    urls = driver.find_elements(By.XPATH, '//a[@href]')
    url_list = [url.get_attribute('href') for url in urls]

    # Close the Selenium web driver
    driver.quit()

    return url_list

# Call the method to automate the process and get the url list, num_of_clicks=
url_list = automate_dell_forum(189)

# Print the urls (forum posts)
#for url in url_list:
#    print(url)

scrolled outside loop
wait(2) outside before loop
Waiting... for a few seconds BEFORE clicking the Load more button
Wait befor done
scrolled in loop at 1
waited 2 secs after scroll
clicked:  2
Waiting... for a few seconds AFTER clicking the Load more button
Wait after done
Waiting... for a few seconds BEFORE clicking the Load more button
Wait befor done
scrolled in loop at 2
waited 2 secs after scroll
clicked:  3
Waiting... for a few seconds AFTER clicking the Load more button
Wait after done
Waiting... for a few seconds BEFORE clicking the Load more button
Wait befor done
scrolled in loop at 3
waited 2 secs after scroll
clicked:  4
Waiting... for a few seconds AFTER clicking the Load more button
Wait after done
Waiting... for a few seconds BEFORE clicking the Load more button
Wait befor done
scrolled in loop at 4
waited 2 secs after scroll
clicked:  5
Waiting... for a few seconds AFTER clicking the Load more button
Wait after done
Waiting... for a few seconds BEFORE clicking the Load 

In [42]:
import csv

with open('list_189.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerow(url_list)

In [43]:
with open('list_189.txt', 'w') as file:
    for item in url_list:
        file.write(str(item) + '\n')

In [44]:
len(url_list) #189

7675

In [26]:
len(url_list)

3713

In [None]:
clicked_200 = url_list[:]

In [27]:

clicked_90 = url_list[:]



In [None]:
len(clicked_90)

In [18]:
len(execute_script_10)

113

In [14]:
len(clicked__urls_60)

2513

In this updated version, the waiting time after clicking the "Load more" button is dynamically increased by adding a fraction (0.5 seconds) of the current click count (count) to the base waiting time of 3 seconds. This ensures that the waiting time increases as more clicks are made, allowing the webpage to load fully before attempting to click again. Adjust the fraction (0.5) as needed based on your observations of the webpage loading time.

### Dell Website: Filter  and clean URLs

- 

In [45]:
def filtered_urls(url_list):
    filtered_urls = []
    exclude_urls = [
        'https://www.dell.com/community/PowerEdge-Hardware-General/bd-p/PowerEdge-General-HW#',
        'https://www.dell.com/community/PowerEdge-Hardware-General/bd-p/custom.dell.link.solutions.href',
        'https://www.dell.com/community/PowerEdge-Hardware-General/bd-p/custom.dell.link.careers.href',
        'https://www.dell.com/community/PowerEdge-Hardware-General/bd-p/custom.dell.link.about.href',
        'https://www.dell.com/community/PowerEdge-Hardware-General/bd-p/PowerEdge-General-HW'
    ]
    for url in url_list:
        if url.startswith('https://www.dell.com/community/PowerEdge-Hardware-General/') and url not in exclude_urls:
            filtered_urls.append(url)
    return filtered_urls

# Filter the urls
filtered_urls = filtered_urls(url_list)

In [46]:
len(filtered_urls)

1901

### Dell website: Visit forum post and pull Answers, Question text and URLs linking to Support documents (Context)

- We want to structure the dataset as Question (the forum query asked by a user), Answer (The community agreed solution) and importantly what we will be calling Context (the text pulled from a linked suppporting artcile). But the website stricture is a little messy, but after some trial and error was able to identify that:
    - The first `lia-message-body-content` always references the Question
    - The second `lia-message-body-content` is always the pinned answer (for "Solved" forum posts)
    - So we will simple combine and save the first `lia-message-body-content` into Questions and the second into Answers
    - Any refered to `https://` in the Answer to be saved for later use as Context.
- ![image.png](attachment:dafb209d-fa02-4cae-8bd0-0d0bba99a0b4.png)

To speed up the `extract_elements_with_class` function and incorporate tqdm for a progress bar, you can make the following modifications:

1. Use `tqdm` to create a progress bar: Import the `tqdm` module and wrap the `urls` list with `tqdm` to create a progress bar that shows the status of the extraction process.

2. Utilize `Session` object from the `requests` module: Instead of creating a new `requests` session for each URL, you can use a `Session` object to take advantage of connection pooling and improve performance.

3. Use `find` instead of `find_all`: Since you only need the first two elements with the specified class name, you can use `find` instead of `find_all` to improve performance.

Here's the modified code:

```python
from tqdm import tqdm
import requests
import pandas as pd
from bs4 import BeautifulSoup

def extract_elements_with_class(urls, class_name):
    elements_list = []
    session = requests.Session()
    for url in tqdm(urls, desc="Extracting elements"):
        try:
            response = session.get(url)
            soup = BeautifulSoup(response.content, 'html.parser')
            elements = soup.find_all(class_=class_name, limit=2)  # Limit to the first two elements
            for element in elements:
                elements_list.append(element.text.strip())
            for _ in range(2 - len(elements)):
                elements_list.append("")  # Append empty strings if elements are not found
        except requests.exceptions.RequestException:
            elements_list.extend(["", ""])  # Append empty strings if there's an error
    return elements_list

class_name = "lia-message-body-content"
extracted_elements = extract_elements_with_class(filtered_urls, class_name)

# Ensure extracted_elements has an even number of elements
if len(extracted_elements) % 2 != 0:
    extracted_elements.append("")  # Append an empty string to make it even

# Split the extracted elements into Questions and Answers lists
Questions = extracted_elements[::2]
Answers = extracted_elements[1::2]

# Create a dataframe called QA
QA = pd.DataFrame({"Questions": Questions, "Answers": Answers})
```

In the modified code, `tqdm` is used to create a progress bar that shows the status of the extraction process. The `requests.Session()` object is created outside the loop to take advantage of connection pooling. Additionally, `find_all` is limited to the first two elements (`limit=2`), and the empty strings are appended using a loop to ensure there are always two elements for each URL.

These modifications should improve the speed of the function and provide a progress bar using `tqdm` to monitor the extraction process.

In [47]:
from tqdm import tqdm
import requests
import pandas as pd
from bs4 import BeautifulSoup

def extract_elements_with_class(urls, class_name):
    elements_list = []
    session = requests.Session()
    for url in tqdm(urls, desc="Extracting elements"):
        try:
            response = session.get(url)
            soup = BeautifulSoup(response.content, 'html.parser')
            elements = soup.find_all(class_=class_name, limit=2)  # Limit to the first two elements
            for element in elements:
                elements_list.append(element.text.strip())
            for _ in range(2 - len(elements)):
                elements_list.append("")  # Append empty strings if elements are not found
        except requests.exceptions.RequestException:
            elements_list.extend(["", ""])  # Append empty strings if there's an error
    return elements_list

class_name = "lia-message-body-content"
extracted_elements = extract_elements_with_class(filtered_urls, class_name)

# Ensure extracted_elements has an even number of elements
if len(extracted_elements) % 2 != 0:
    extracted_elements.append("")  # Append an empty string to make it even

# Split the extracted elements into Questions and Answers lists
Questions = extracted_elements[::2]
Answers = extracted_elements[1::2]

# Create a dataframe called QA
QA_large = pd.DataFrame({"Questions": Questions, "Answers": Answers})

Extracting elements: 100%|██████████| 1901/1901 [54:27<00:00,  1.72s/it] 


In [48]:
QA_large.head()

Unnamed: 0,Questions,Answers
0,"We provide you a variety of support related videos on our DELL EMC Support YouTube channel. We publish at least one new video every week so make sure you subscribe and stay up to date with the latest turoials, tipps and tricks about server, storage and networking.\n \nHere are some playlist you might find helpful for your daily business regarding DELL EMC Enterprise hardware.\n\nOpenManage Server Administrator \nRAID - Tutorials, Information and Troubleshooting \nDell EMC QuickTips - something about everything \niDRAC - Setup, Configuration, Troubleshooting \nDell Lifecycle Controller \nSupportAssist Enterprise Virtual Edition \n\nYou can find the full list here.\n \nSomething is missing? Got a topic we should cover in one of our videos? Feel free to suggest new topics and give us feedback to existing ones in this thread.","Hi All,\nthere are 2 new videos up on the channel. As always a Quick Tip Video on Service Tag locations and in addition we show you how to install OMSA on ESXi 6.7\nAnd don't forget to like the videos and subscribe to our channel for all the latest updates!"
1,"Hello,We were asked by DELL supporter to update iDRAC again for a hardware support, from 6.02.00.00 to 6.10.00.00. After updating, the secure web access to iDRAC failed with error:Bad RequestYour browser sent a request that this server could not understand.Additionally, a 400 Bad Request error was encountered while trying to use an ErrorDocument to handle the request.After doing some tests, I have the following conclusions/workarounds:1. the secure web access failed when using the FQDN of iDRAC interface2. using the IP or short hostname works with the secure web access3. setting iDRAC.WebServer.ManualDNSEntry to have the FQDN included won't solve the issue4. disabling idrac.webserver.HostHeaderCheck aslo works, but could not open virtual console viewer I tried to use racadm command 'sslresetcfg' to regenerate the certificate, but only short hostname used as Common Name (CN) and also only the short hostname listed in the DNS alternative name. By the way, we have DNS BMC/RAC Name and DNS Domain name correctly configured.It looks like to me a new bug in version 6.10.00.00. Though there are not issues with the IP and short hostname access, it is still annoying since we have FQDN defined and linked everywhere. Thanks,Di\n\n\n\n\t\t\t\t\t\n\t\t\t\t\t\tSolved!\n\t\t\t\t\t\n\t\t\t\t\tGo to Solution.","Indeed, this seems fixed in later version. I just updated one of our nodes to version 6.10.80.00, the issue described in my first post has gone. When I was testing iDRAC version 6.10.30.00, the issue was sill there.\n\n\nView solution in original post"
2,"After a long power outage, the accountant decided to turn on the server on her own.After pressing the button ""i"" and holding it for a little longer, she reset the settings iDRAC along with the license.How can I restore it now? The server was purchased in 2014. Service Tag <Service Tag was removed>.\n \n\n\n\n\t\t\t\t\t\n\t\t\t\t\t\tSolved!\n\t\t\t\t\t\n\t\t\t\t\tGo to Solution.","Hi, Sergei 66,\n \niDRAC license was sent to you.\n \nPlease ask me if you have any questions, \n \nThank you,\nMaria Januszka\n#IWork4Dell\nDell | Social Outreach Services - Enterprise\n\nMaria JSocial Media and Communities Professional Dell Technologies | Enterprise Support Services #Iwork4Dell Did I answer your query? Please click on ‘Accept as Solution’‘Kudo’ the posts you like!\n\n\n\nView solution in original post"
3,"Hi Dell Team,We are currently experiencing an issue with the ServiceTag on our Dell server. When attempting to enter the tag on Dell's website, an error message stating ""Service Tag or Product ID Search Error"" is displayed. could you please advise? Best regards!\n\n\n\n\t\t\t\t\t\n\t\t\t\t\t\tSolved!\n\t\t\t\t\t\n\t\t\t\t\tGo to Solution.","It works now, Thank you so much Young E.Have a great day.\n\n\nView solution in original post"
4,"Hello! I want to perform a memory upgrade on my R6515 server.Currently the server has 16x32GB DDR4-2400 RDIMMI bought 16x64GB DDR4-2400 LRDIMM. After the memory upgrade the server only contained these LRDIMM memory modules and nothing was mixed. They were the same brand, speed, type and so just like before.The issue I'm having is that when trying to boot the server with the new memory it doesn't get past the ""Please wait while system is initializing"". It's stuck there forever with no obvious errors.I've tried to pull out the pcie u.2 disks. Try booting with one power supply instead of two. Tried clearing the CMOS but nothing seemed to have solved the issue. If I remove the new ram and put the old ram back in again the system boots fine. It's not clear to me why it's not working because in the specifications the Poweredge R6515 supports both LRDIMM and RDIMM\n\n\n\n\t\t\t\t\t\n\t\t\t\t\t\tSolved!\n\t\t\t\t\t\n\t\t\t\t\tGo to Solution.","SDeltaE,\n \n \nThat is the correct dimm, as you can see the part number I provided embedded under the Manufacturer Part number. To answer your question it doesn't appear that it would run at 3200 when fully populated, it looks to run at 2933 when more than 2 dimms per channel are installed, as seen on page 48 here.\n \n \n \n\nDELL-Chris HSocial Media and Communities ProfessionalDell Technologies | Enterprise Support Services#IWork4DellDid I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!\n\n\n\nView solution in original post"


In [7]:
# import


# Read the CSV file into a DataFrame
QA_large = pd.read_csv("QA_large_cleaned.csv")

In [6]:
# export
QA_large.to_csv('QA_large_cleaned.csv', index=False)

NameError: name 'QA_large' is not defined

### Data Cleaning

In [8]:
# Set pandas options to display all columns and rows without truncation
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)

In [9]:
# Create a new dataframe for cleaned data
QA_cleaned = QA_large.copy()

# Clean up the "Questions" and "Answers" columns
QA_cleaned['Questions'] = QA_cleaned['Questions'].str.replace('Go to Solution', '').str.replace('Solved!', '')
QA_cleaned['Questions'] = QA_cleaned['Questions'].str.replace('\n', '').str.replace('\t', '').str.replace('IWork4Dell', '')
QA_cleaned['Questions'] = QA_cleaned['Questions'].str.replace("Did I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!", '')

QA_cleaned['Answers'] = QA_cleaned['Answers'].str.replace("Did I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!", '')
QA_cleaned['Answers'] = QA_cleaned['Answers'].str.replace("View solution in original post", '')
QA_cleaned['Answers'] = QA_cleaned['Answers'].str.replace('\n', '').str.replace('\t', '').str.replace('#IWork4Dell', '')


  QA_cleaned['Questions'] = QA_cleaned['Questions'].str.replace("Did I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!", '')
  QA_cleaned['Answers'] = QA_cleaned['Answers'].str.replace("Did I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!", '')


In [34]:
# export
QA_cleaned.to_csv('QA_cleaned.csv', index=False)

In [10]:
QA_cleaned.head()

Unnamed: 0,Questions,Answers
0,"We provide you a variety of support related videos on our DELL EMC Support YouTube channel. We publish at least one new video every week so make sure you subscribe and stay up to date with the latest turoials, tipps and tricks about server, storage and networking. Here are some playlist you might find helpful for your daily business regarding DELL EMC Enterprise hardware.OpenManage Server Administrator RAID - Tutorials, Information and Troubleshooting Dell EMC QuickTips - something about everything iDRAC - Setup, Configuration, Troubleshooting Dell Lifecycle Controller SupportAssist Enterprise Virtual Edition You can find the full list here. Something is missing? Got a topic we should cover in one of our videos? Feel free to suggest new topics and give us feedback to existing ones in this thread.","Hi All,there are 2 new videos up on the channel. As always a Quick Tip Video on Service Tag locations and in addition we show you how to install OMSA on ESXi 6.7And don't forget to like the videos and subscribe to our channel for all the latest updates!"
1,"Hello,We were asked by DELL supporter to update iDRAC again for a hardware support, from 6.02.00.00 to 6.10.00.00. After updating, the secure web access to iDRAC failed with error:Bad RequestYour browser sent a request that this server could not understand.Additionally, a 400 Bad Request error was encountered while trying to use an ErrorDocument to handle the request.After doing some tests, I have the following conclusions/workarounds:1. the secure web access failed when using the FQDN of iDRAC interface2. using the IP or short hostname works with the secure web access3. setting iDRAC.WebServer.ManualDNSEntry to have the FQDN included won't solve the issue4. disabling idrac.webserver.HostHeaderCheck aslo works, but could not open virtual console viewer I tried to use racadm command 'sslresetcfg' to regenerate the certificate, but only short hostname used as Common Name (CN) and also only the short hostname listed in the DNS alternative name. By the way, we have DNS BMC/RAC Name and DNS Domain name correctly configured.It looks like to me a new bug in version 6.10.00.00. Though there are not issues with the IP and short hostname access, it is still annoying since we have FQDN defined and linked everywhere. Thanks,Di.","Indeed, this seems fixed in later version. I just updated one of our nodes to version 6.10.80.00, the issue described in my first post has gone. When I was testing iDRAC version 6.10.30.00, the issue was sill there."
2,"After a long power outage, the accountant decided to turn on the server on her own.After pressing the button ""i"" and holding it for a little longer, she reset the settings iDRAC along with the license.How can I restore it now? The server was purchased in 2014. Service Tag <Service Tag was removed>. .","Hi, Sergei 66, iDRAC license was sent to you. Please ask me if you have any questions, Thank you,Maria JanuszkaDell | Social Outreach Services - EnterpriseMaria JSocial Media and Communities Professional Dell Technologies | Enterprise Support Services #Iwork4Dell Did I answer your query? Please click on ‘Accept as Solution’‘Kudo’ the posts you like!"
3,"Hi Dell Team,We are currently experiencing an issue with the ServiceTag on our Dell server. When attempting to enter the tag on Dell's website, an error message stating ""Service Tag or Product ID Search Error"" is displayed. could you please advise? Best regards!.","It works now, Thank you so much Young E.Have a great day."
4,"Hello! I want to perform a memory upgrade on my R6515 server.Currently the server has 16x32GB DDR4-2400 RDIMMI bought 16x64GB DDR4-2400 LRDIMM. After the memory upgrade the server only contained these LRDIMM memory modules and nothing was mixed. They were the same brand, speed, type and so just like before.The issue I'm having is that when trying to boot the server with the new memory it doesn't get past the ""Please wait while system is initializing"". It's stuck there forever with no obvious errors.I've tried to pull out the pcie u.2 disks. Try booting with one power supply instead of two. Tried clearing the CMOS but nothing seemed to have solved the issue. If I remove the new ram and put the old ram back in again the system boots fine. It's not clear to me why it's not working because in the specifications the Poweredge R6515 supports both LRDIMM and RDIMM.","SDeltaE, That is the correct dimm, as you can see the part number I provided embedded under the Manufacturer Part number. To answer your question it doesn't appear that it would run at 3200 when fully populated, it looks to run at 2933 when more than 2 dimms per channel are installed, as seen on page 48 here. DELL-Chris HSocial Media and Communities ProfessionalDell Technologies | Enterprise Support ServicesDid I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!"


### Contexts

In [11]:
# Read the CSV file into a DataFrame 
QA_large = pd.read_csv("QA_cleaned.csv")

# Convert the 'Answers' column to string type
QA_large['Answers'] = QA_large['Answers'].astype(str)

# Extract links using regular expression and handle non-string values
QA_large['Context_url'] = QA_large['Answers'].apply(lambda x: re.findall('https://[\S]+', x) if isinstance(x, str) else [])

# Convert the list of links into a single string separated by commas
QA_large['Context_url'] = QA_large['Context_url'].apply(lambda x: ', '.join(x) if x else '')

# Create a new DataFrame called 'QA_docs' with selected columns
QA_docs_v2 = QA_large[['Questions', 'Answers', 'Context_url']].copy()

In [12]:
QA_docs_v2.tail(10)

Unnamed: 0,Questions,Answers,Context_url
901,"I have 2x Dell R610 servers. One proceeds quickly through POST while the other drops into ""Collecting System Inventory"" for about 10 mins before booting.Is there a way to disable the ""collecting system inventory"" on every boot?.","Usao,Yes there is a way to disable Collecting System Inventory on Restart (CSIOR).Access the iDRAC6 Configuration Utility by pressing <CTRL+E> during POST.Select System Services from the main menu.Change the Collect System Inventory on Restart setting to Disabled.See a similar screen shot below:Let me know if you have any further questions",
902,"Hi, Team I have a poweredge T410 with intel E5620 on CPU 1. It runs great but I want to upgrade to XEON X5675 on both CPU slots. I purchase two X5675 and I specifically told them to make sure that they work in tandem both have same specs but different S#s. they work separately fine on CPU one on CPU 2 it halts at boot and give me a message in RED letter to move the chip to CPU1. Order Memory from memory netM393B2G70BH0-YH9Samsung 1x 16GB DDR3-1333 RDIMM PC3L-10600R Dual Rank x4 ModuleBios ver 1.14.0I have followed the poweredge T410 technical guide about memory configuration but nothing. Computer will not Boot up.Help please..",you can find here the updated list of processorhttps://dell.to/32FXYevpage 25Marco B.Social Media and Communities Professional Dell Technologies | Enterprise Support ServicesDid I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!,https://dell.to/32FXYevpage
903,Server would not power on. Dell tech came and replaced the backplane and PER620 card and then left. The server will now power on but will not boot the existing volume or virtual drive. The server is remote to us and I’m guessing and hoping it’s a simple configuration issue with the new controller but how did the Dell tech know if they didn't see it boot up? The folks on site said the POST does recognize the 1 virtual drive. But it does not boot up. They tried to boot using the BIOS and the UEFI options.Is it something simple like setting the virtual drive as bootable? I thought that some controllers could read the config from the disks?I was hoping there was maybe a tech note on getting the system to boot after replacing the back plane and or PER620 controller..,I looked at the original specs and there was an SD card and Internal Dual SD Module.342-3595 : 1GB SD Card for RIPS331-4441 : Internal Dual SD Module318-2036 : vFlash SD Slot Filler The folks on site took a look and the SD card was popped out or not seated. With the SD card pushed back in the BIOS now sees the SD card and boots from it.,
904,"Hi All,I currently have two Dell Power Edge R720 servers. Both servers have redundant 1100W AC power supplies. Two weeks ago a power outage caused one power supply of each server to malfunction which were replaced under warranty by new power supplies. Since then the PS2 socket of both servers have been reporting an error, VLT0204 System board PS2 PG Fail voltage is outside of range. Contact support.In order to solve the problem I powered off the system and disconnected the power cords and pressed the power button. I then swapped the power supplies re-connected the power cords and powered on the system. I repeated this step for the second server. Both the systems now did not report the VLT0204 error. However, the 'VLT0204 System board PS2 PG Fail voltage is outside of range. Contact support.' error message appeared on the front panel again. Repeating the steps above did not resolve the issue. I realised that this time the VLT0204 error became persistent. Just to be sure that the error is persistent I cleared the hardware logs after which I can still see a critical error in the iDRAC Server Health -> Voltages section.I used the DSET to generate the server report logs which too shows the VLT0204 error message. I can provide the entire zip file if needed.Should I ignore this error message or should I send it to Dell for inspection under warranty. Many Thanks Sohaib.","MSI-PK,If the #2 power supply is removed all together, do you still get the error even when it is removed? Also I would take the server to the Minimum hardware configuration, which is removing everything installed but the following;One Power SupplyControl Panel (for power button functionality)One Processor (CPU) in socket CPU1 (minimum for troubleshooting)One Memory Module (DIMM) installed in socket A1Riser 2 and 3 for single processor configuration OR Riser 1, 2, and 3 for dual processor configuration Power it up with that configuration, you can also swap out Power Supply 1 with 2 to test both power supplies under this configuration, and see if the error is still present. If so, then likely the motherboards voltage regulator was damaged in the power outage you had stated.If the error does clear, try adding the removed parts back to the server individually until the return of the error shows which device is causing the issue. Let me know what you see.DELL-Chris HSocial Media and Communities ProfessionalDell Technologies | Enterprise Support ServicesDid I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!",
905,"I am looking to run a set of 3 Dell R630's, each with a CPU in Slot 1, all Side A RAM filled.I need to have 4 10 Gig NIC's per machine, so I would like to add 2 expansion cards to each.Is there a configuration that will let CPU 1 run 2 expansion cards without needing to have CPU and RAM in Slots 2 / Side B?.","Hi, It seems that you will need the 2nd CPU. https://dell.to/2SZzv0jDELL-Joey CSocial Media and Communities ProfessionalDell Technologies | Enterprise Support ServicesDid I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!",https://dell.to/2SZzv0jDELL-Joey
906,"Can I boot off of a 2.5"" U2 NVMe SSD on a PowerEdge R7515 server, assuming it is configured with the SAS/SATA/NVMe backplane (12 SAS/SATA + 12 NVMe option)?My local sales rep claims that I can only boot from NVMe if I go for the 24-port ""NVMe-only"" backplane..","I got in touch with the product manager of the R7515, and he explained the issue.NVMe boot is fully possible on all latest gen PowerEdge servers.But for some reason Dell is not able to deliver the system *with an OS installed* if the system is configured with the SAS/SATA/NVMe backplane. Dell is only able to install the OS on a BOSS card, SD card, or an SAS/SATA drive. If the system is configured with the all-NVMe backplane, then Dell is able to install the OS on a NVMe drive.So as long as you order the system without an OS installed from the factory you are able to use an NVMe drive as boot in any configuration.",
907,"Hi I have an R740 installed in my Customers location. Is is running vmware and idrac 9.It was purchased with the Secure Password for iDracWhen on site to install I hit F2 and configured the network config saved and exited.I can browse to the ip and I get the iDrac login.What should the username be when using the special password on the bottom of the server tab?I leave it blank can't login, I use root, can't login, I use default, can't login.Documentation is vague as to what the username should be.Help, I really don't want to go back onsite. .","The username should still be root, https://dell.to/2P9RgaW“The Secure Password will be located on a sticker on the underside of the system tag with the Service Tag information. If the default password on the sticker is blank, it means that the default password is ""calvin"". ""root"" remains the username.”Thanks,DELL-Josh CrSocial Media and Communities ProfessionalDell Technologies | Enterprise Support ServicesDid I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!",https://dell.to/2P9RgaW“The
908,"Hi Dell, is R620 EOL? currently R620 is not RHEL8 certified is there plans to certifie? if not please state reason.","Hello,The EOL date is 5 years from sale, with the exception being when hardware is available, it may extend that time to 7 years. As for RHEL 8, if it is going to be designated as a supported OS, I wouldn't expect it for a quarter or two. Updates generally come out in quarterly releases and I wouldn't expect support for a quarter or two, depending on if this is something being worked on. I'm in the process of downloading a RHEL 8 .ISO, and I'd be happy to attempt to install it on a 12G and let you know if it succeeds, if you like.RHEL 8 is supported on the newer 13G and 14G platforms, however. EDIT: I was successful in installing RHEL 8 on an R620, and it appears to work just fine. In my case, the installer did not see the storage attached to the PERC H310 however, and I wound up installing to a USB. The RAID controller shows up in lspci as ""LSI Logic / Symbios Logic MegaRAID SAS 2008 [Falcon] (rev 03)."" I did not try manually loading a driver, but I did see that RHEL 8 uses a newer kernel, so I would not expect the RHEL 7 driver to work, since it was for an older kernel. You might try it, but from what I've seen so far, I would not expect it to work. I've seen a number of posts in other forums indicating that Redhat disabled support for these controllers, but I can't necessarily vouch for the accuracy of that information. #Iwork4Dell",
909,"Hello everyone,We have just had our first Dell server, the PowerEdge R440. We are excited and also, a bit shame to say we have no experience with the server previously.OK, We did not choose the iDRAC Enterprise when build the server, we selected standard.Today I start setting up the server and in the iDRAC interface, all the contents are ""read-only"" and all options are grey-out or not available to change/save... Is this because we did not have the enterprise licence?We have not config the iDRAC from the System setting yet though...Thank you for your advice.","James, If you are accessing the iDrac and seeing them grayed out the first thing I would look at is if the Local Config using settings is Enabled. If you access the iDrac web interface, not the local server side connection, then got to iDrac Settings - Services - verify if the ""Disable iDrac Local Configuration is set to Enabled on either selection, as seen below. Let me know what you see, and if changing that setting allows you to configure the iDrac as you were. DELL-Chris HSocial Media and Communities ProfessionalDell Technologies | Enterprise Support ServicesDid I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!",
910,"NormalFri Jul 20 2018 00:31:30Log cleared.CriticalFri Jul 20 2018 09:58:29Memory mirror redundancy is lost. Check memory device at location DIMM_B3.CriticalFri Jul 20 2018 13:38:01Memory mirror redundancy is lost. Check memory device at location DIMM_B3.CriticalFri Jul 20 2018 13:54:37Memory mirror redundancy is lost. Check memory device at location DIMM_B3.CriticalFri Jul 20 2018 14:24:48Memory mirror redundancy is lost. Check memory device at location DIMM_B3.CriticalFri Jul 20 2018 15:47:23Memory mirror redundancy is lost. Check memory device at location DIMM_B3.CriticalFri Jul 20 2018 17:01:30Memory mirror redundancy is lost. Check memory device at location DIMM_B3.CriticalSun Jul 22 2018 00:17:17Memory mirror redundancy is lost. Check memory device at location DIMM_B3. BIOS Version6.6.0 Firmware Version2.90 (Build 04) Lifecycle Controller Firmware1.7.5.4 When I first got this tower, it used this motherboard:0CX0R0 / CX0R01 CPU and maybe 8 or 16GB RAM total.I upgraded to 2x5670 192G RAM but always encountered this error (see event log even though DIMM3 was replaced with new RAM.i upgraded to 2x5690 and that required a new board, so I replaced the board with 09CGW2 / 9CGW2, so now i have new CPU, new board and the error persists.so I'm like ok it's gotta be the RAM. my ram is Kingston DDR3L-1600Mhz all 16GB modules. i think ECC Reg. so i replaced it with Micro DDR3 16GB modules x 2 sticks, 1 in A1 and 1in B1, this is a supported config, no config warnings when POSTing am I supposed to move 2 stick of ram elsewhere not in A1 and B1? Also why is the error persistent and always about DIMM3, currently there is no RAM stick in DIMM3, now remember. I have replaced my motherboard! it can't be the slot!i'm at a loss here, i'm running latest FW across every components, i've rebooted, powered off, unplug power cord, discharge power button then plug in cord and powered it back. Reset all EFI settings/DRAC the works to no avail. Should I give up on 16GB module dream for the T610 system and just go with something else? Is anyone else successfully using 16GB modules in any config (especially the max 192GB variety)? If so please let me know what brand model/make is. i can't buy all the models out there. the compatible RAM list is long aged. does this chipset just like not 16GB? closest error code (which is not marked in the evelogs of DRAC) is: MEM1205Memory mirror redundancy is lost. Check memory device at location(s) .The memory may not be seated correctly, be misconfigured, or it may have failed.Check the memory configuration . Reseat the memory modules. If error remains, swap test the memory module by swapping the module with another identical module in the system, see if the error follows the module or not. If the issue persists, Contact Support as a memory replacement might be needed ^i have combed through all that... remember i have purchased yet one ore vendor RAM! so not the RAM i dont think. Also, 8GB, 4GB all does not exhibit this behavior regardless of B3 is occupied or not... only issue arises on 16GB RAM. memtest86 all came back clean on all the RAM i have for this box <ADMIN NOTE: Broken link has been removed from this post by Dell>.",finally fixed it by reinstalling the OS.so it explains why during POST or hanging in BIOS never did anything to the error message. somehow the older Windows Server OS would corrupt or throw an error code to the memory... maybe it sends something to the openmanage/drac platform? either way it's resolved by reinstalling.,


In [13]:
QA_large.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 911 entries, 0 to 910
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Questions    911 non-null    object
 1   Answers      911 non-null    object
 2   Context_url  911 non-null    object
dtypes: object(3)
memory usage: 21.5+ KB


### Labeling Data with Haystack Annotation Tool


https://www.deepset.ai/blog/labeling-data-with-haystack-annotation-tool


- ![image.png](attachment:888170ad-bf00-48ba-9920-d76c63a91fbe.png)




Please use comma separated `CSVs` and include the header in the first line. You might also want to wrap text around quotation marks, e.g. if it contains commas. We always use pandas to_csv method, this should format the file in the right way.

```
docs.csv:
document_identifier,document_text
id1,bbbbb
id2,muh
```
the same holds for the questions.csv:
```
question,document_identifier,question_identifier
question1,id1,qid1
question2,id2,qid2

```


Lets take a subset to test with

### only use rows with URLs

QA_con = QA_large[df['Context_url'].notna()]


In [18]:
QA_con = QA_large[QA_large['Context_url'].notna()]

In [23]:
QA_con.head()

Unnamed: 0,Questions,Answers,Context_url
0,"We provide you a variety of support related videos on our DELL EMC Support YouTube channel. We publish at least one new video every week so make sure you subscribe and stay up to date with the latest turoials, tipps and tricks about server, storage and networking. Here are some playlist you might find helpful for your daily business regarding DELL EMC Enterprise hardware.OpenManage Server Administrator RAID - Tutorials, Information and Troubleshooting Dell EMC QuickTips - something about everything iDRAC - Setup, Configuration, Troubleshooting Dell Lifecycle Controller SupportAssist Enterprise Virtual Edition You can find the full list here. Something is missing? Got a topic we should cover in one of our videos? Feel free to suggest new topics and give us feedback to existing ones in this thread.","Hi All,there are 2 new videos up on the channel. As always a Quick Tip Video on Service Tag locations and in addition we show you how to install OMSA on ESXi 6.7And don't forget to like the videos and subscribe to our channel for all the latest updates!",
1,"Hello,We were asked by DELL supporter to update iDRAC again for a hardware support, from 6.02.00.00 to 6.10.00.00. After updating, the secure web access to iDRAC failed with error:Bad RequestYour browser sent a request that this server could not understand.Additionally, a 400 Bad Request error was encountered while trying to use an ErrorDocument to handle the request.After doing some tests, I have the following conclusions/workarounds:1. the secure web access failed when using the FQDN of iDRAC interface2. using the IP or short hostname works with the secure web access3. setting iDRAC.WebServer.ManualDNSEntry to have the FQDN included won't solve the issue4. disabling idrac.webserver.HostHeaderCheck aslo works, but could not open virtual console viewer I tried to use racadm command 'sslresetcfg' to regenerate the certificate, but only short hostname used as Common Name (CN) and also only the short hostname listed in the DNS alternative name. By the way, we have DNS BMC/RAC Name and DNS Domain name correctly configured.It looks like to me a new bug in version 6.10.00.00. Though there are not issues with the IP and short hostname access, it is still annoying since we have FQDN defined and linked everywhere. Thanks,Di.","Indeed, this seems fixed in later version. I just updated one of our nodes to version 6.10.80.00, the issue described in my first post has gone. When I was testing iDRAC version 6.10.30.00, the issue was sill there.",
2,"After a long power outage, the accountant decided to turn on the server on her own.After pressing the button ""i"" and holding it for a little longer, she reset the settings iDRAC along with the license.How can I restore it now? The server was purchased in 2014. Service Tag <Service Tag was removed>. .","Hi, Sergei 66, iDRAC license was sent to you. Please ask me if you have any questions, Thank you,Maria JanuszkaDell | Social Outreach Services - EnterpriseMaria JSocial Media and Communities Professional Dell Technologies | Enterprise Support Services #Iwork4Dell Did I answer your query? Please click on ‘Accept as Solution’‘Kudo’ the posts you like!",
3,"Hi Dell Team,We are currently experiencing an issue with the ServiceTag on our Dell server. When attempting to enter the tag on Dell's website, an error message stating ""Service Tag or Product ID Search Error"" is displayed. could you please advise? Best regards!.","It works now, Thank you so much Young E.Have a great day.",
4,"Hello! I want to perform a memory upgrade on my R6515 server.Currently the server has 16x32GB DDR4-2400 RDIMMI bought 16x64GB DDR4-2400 LRDIMM. After the memory upgrade the server only contained these LRDIMM memory modules and nothing was mixed. They were the same brand, speed, type and so just like before.The issue I'm having is that when trying to boot the server with the new memory it doesn't get past the ""Please wait while system is initializing"". It's stuck there forever with no obvious errors.I've tried to pull out the pcie u.2 disks. Try booting with one power supply instead of two. Tried clearing the CMOS but nothing seemed to have solved the issue. If I remove the new ram and put the old ram back in again the system boots fine. It's not clear to me why it's not working because in the specifications the Poweredge R6515 supports both LRDIMM and RDIMM.","SDeltaE, That is the correct dimm, as you can see the part number I provided embedded under the Manufacturer Part number. To answer your question it doesn't appear that it would run at 3200 when fully populated, it looks to run at 2933 when more than 2 dimms per channel are installed, as seen on page 48 here. DELL-Chris HSocial Media and Communities ProfessionalDell Technologies | Enterprise Support ServicesDid I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!",


### explode any row >1 URL per answer

In [19]:
df_exploded = QA_con.assign(Context_url=QA_large['Context_url'].str.split(',')).explode('Context_url').reset_index(drop=True)


In [20]:
df_exploded.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 994 entries, 0 to 993
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Questions    994 non-null    object
 1   Answers      994 non-null    object
 2   Context_url  994 non-null    object
dtypes: object(3)
memory usage: 23.4+ KB


In [21]:
df_exploded.head()

Unnamed: 0,Questions,Answers,Context_url
0,"We provide you a variety of support related videos on our DELL EMC Support YouTube channel. We publish at least one new video every week so make sure you subscribe and stay up to date with the latest turoials, tipps and tricks about server, storage and networking. Here are some playlist you might find helpful for your daily business regarding DELL EMC Enterprise hardware.OpenManage Server Administrator RAID - Tutorials, Information and Troubleshooting Dell EMC QuickTips - something about everything iDRAC - Setup, Configuration, Troubleshooting Dell Lifecycle Controller SupportAssist Enterprise Virtual Edition You can find the full list here. Something is missing? Got a topic we should cover in one of our videos? Feel free to suggest new topics and give us feedback to existing ones in this thread.","Hi All,there are 2 new videos up on the channel. As always a Quick Tip Video on Service Tag locations and in addition we show you how to install OMSA on ESXi 6.7And don't forget to like the videos and subscribe to our channel for all the latest updates!",
1,"Hello,We were asked by DELL supporter to update iDRAC again for a hardware support, from 6.02.00.00 to 6.10.00.00. After updating, the secure web access to iDRAC failed with error:Bad RequestYour browser sent a request that this server could not understand.Additionally, a 400 Bad Request error was encountered while trying to use an ErrorDocument to handle the request.After doing some tests, I have the following conclusions/workarounds:1. the secure web access failed when using the FQDN of iDRAC interface2. using the IP or short hostname works with the secure web access3. setting iDRAC.WebServer.ManualDNSEntry to have the FQDN included won't solve the issue4. disabling idrac.webserver.HostHeaderCheck aslo works, but could not open virtual console viewer I tried to use racadm command 'sslresetcfg' to regenerate the certificate, but only short hostname used as Common Name (CN) and also only the short hostname listed in the DNS alternative name. By the way, we have DNS BMC/RAC Name and DNS Domain name correctly configured.It looks like to me a new bug in version 6.10.00.00. Though there are not issues with the IP and short hostname access, it is still annoying since we have FQDN defined and linked everywhere. Thanks,Di.","Indeed, this seems fixed in later version. I just updated one of our nodes to version 6.10.80.00, the issue described in my first post has gone. When I was testing iDRAC version 6.10.30.00, the issue was sill there.",
2,"After a long power outage, the accountant decided to turn on the server on her own.After pressing the button ""i"" and holding it for a little longer, she reset the settings iDRAC along with the license.How can I restore it now? The server was purchased in 2014. Service Tag <Service Tag was removed>. .","Hi, Sergei 66, iDRAC license was sent to you. Please ask me if you have any questions, Thank you,Maria JanuszkaDell | Social Outreach Services - EnterpriseMaria JSocial Media and Communities Professional Dell Technologies | Enterprise Support Services #Iwork4Dell Did I answer your query? Please click on ‘Accept as Solution’‘Kudo’ the posts you like!",
3,"Hi Dell Team,We are currently experiencing an issue with the ServiceTag on our Dell server. When attempting to enter the tag on Dell's website, an error message stating ""Service Tag or Product ID Search Error"" is displayed. could you please advise? Best regards!.","It works now, Thank you so much Young E.Have a great day.",
4,"Hello! I want to perform a memory upgrade on my R6515 server.Currently the server has 16x32GB DDR4-2400 RDIMMI bought 16x64GB DDR4-2400 LRDIMM. After the memory upgrade the server only contained these LRDIMM memory modules and nothing was mixed. They were the same brand, speed, type and so just like before.The issue I'm having is that when trying to boot the server with the new memory it doesn't get past the ""Please wait while system is initializing"". It's stuck there forever with no obvious errors.I've tried to pull out the pcie u.2 disks. Try booting with one power supply instead of two. Tried clearing the CMOS but nothing seemed to have solved the issue. If I remove the new ram and put the old ram back in again the system boots fine. It's not clear to me why it's not working because in the specifications the Poweredge R6515 supports both LRDIMM and RDIMM.","SDeltaE, That is the correct dimm, as you can see the part number I provided embedded under the Manufacturer Part number. To answer your question it doesn't appear that it would run at 3200 when fully populated, it looks to run at 2933 when more than 2 dimms per channel are installed, as seen on page 48 here. DELL-Chris HSocial Media and Communities ProfessionalDell Technologies | Enterprise Support ServicesDid I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!",


### export as csv

In [24]:
df_exploded.to_csv('df_exploded.csv', index=False)

In [2]:
# import

df_exploded = pd.read_csv('df_exploded.csv')

### Context

In [3]:
df_context = df_exploded[df_exploded['Context_url'].str.contains('http://|https://', na=False)]

In [4]:
df_context.head()

Unnamed: 0,Questions,Answers,Context_url
6,"Hi,Just got R710 and for the life of me, I can...",Hello thanks for choosing Dell. Could you try ...,https://dell.to/3pDHVh5
17,"Hi,I see on the manual of the T630 server, on ...","Hello, Yes, it is doable. I am currently runni...",https://i.stack.imgur.com/Spsgx.pngFrom
18,"HelloI got Dell PowerEdge T320 8x3.5"", seems t...",So I've never done this on the T320 specifical...,https://www.dell.com/support/home/en-us/driver...
20,I have to upgrade a customer's Dell Power Edge...,"Hi, i see only 1 here https://dell.to/46iQaQj ...",https://dell.to/46iQaQj
21,I have a R640 here [personal information remov...,"Hello dave_mcl, Here is the link for latest fi...",https://dell.to/3XjjaDc


In [None]:

# Create a function to fetch the text from a given URL
def fetch_text(url):
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise an exception for non-successful status codes
        return response.text
    except requests.exceptions.RequestException as e:
        print(f"Error fetching URL: {url}")
        print(f"Error message: {str(e)}")
        return ''

# Apply the fetch_text function to each URL in the DataFrame and store the result in a new column called 'Context'
tqdm.pandas()  # Enable progress_apply with tqdm
df_context['Context'] = df_context['Context_url'].progress_apply(fetch_text)

  7%|▋         | 21/313 [00:20<02:35,  1.87it/s]

Error fetching URL: https://www.computerbase.de/forum/threads/proxmox-rechner-startet-permanent-nach-dem-herunterfahren....GRUB_CMDLINE_LINUX_DEFAULT="xhci_hcd.quirks=270336"https://www.truenas.com/community/threads/kernel-boot-parameter-how-to-add-to-tn-scale.110109/It
Error message: 404 Client Error: Not Found for url: https://www.computerbase.de/forum/threads/proxmox-rechner-startet-permanent-nach-dem-herunterfahren....GRUB_CMDLINE_LINUX_DEFAULT=%22xhci_hcd.quirks=270336%22https:/www.truenas.com/community/threads/kernel-boot-parameter-how-to-add-to-tn-scale.110109/It


  8%|▊         | 24/313 [00:22<02:52,  1.67it/s]

Error fetching URL: https://kb.vmware.com/s/article/1007036Dell
Error message: 403 Client Error: Forbidden for url: https://kb.vmware.com/s/article/1007036Dell


 13%|█▎        | 41/313 [02:26<40:50,  9.01s/it]  

Error fetching URL:  https://www.rapidtables.com/convert/number/decimal-to-hex.html?x=75
Error message: 403 Client Error: Forbidden for url: https://www.rapidtables.com/convert/number/decimal-to-hex.html?x=75


 14%|█▍        | 45/313 [02:30<13:11,  2.95s/it]

Error fetching URL: https://www.dell.com/support/manuals/en-us/idrac7-8-lifecycle-controller-v2.30.30.30/lc_2.10.10.10_u...
Error message: 404 Client Error: Not Found for url: https://www.dell.com/support/manuals/en-ie/manuals/idrac7-8-lifecycle-controller-v2.30.30.30/lc_2.10.10.10_u.../Sorry


 15%|█▍        | 46/313 [02:30<09:36,  2.16s/it]

Error fetching URL: https://cloudninjas.com/products/dell-128gb-16-x-8gb-ddr3-1333-mhz-pc3-10600r-ecc-registered-server-...
Error message: 404 Client Error: Not Found for url: https://cloudninjas.com/products/dell-128gb-16-x-8gb-ddr3-1333-mhz-pc3-10600r-ecc-registered-server-...


 19%|█▊        | 58/313 [02:45<06:31,  1.54s/it]

Error fetching URL: https://www.dell.com/support/manuals/en-us/poweredge-r420/r420ownersmanual-v2/technical-specificatio...https://www.dell.com/support/manuals/en-us/poweredge-r520/r520systemsownersmanual/technical-specific...Thanks
Error message: 404 Client Error: Not Found for url: https://www.dell.com/support/manuals/en-us/poweredge-r420/r420ownersmanual-v2/technical-specificatio...https://www.dell.com/support/manuals/en-us/poweredge-r520/r520systemsownersmanual/technical-specific...Thanks


 21%|██        | 66/313 [02:56<07:50,  1.90s/it]

Error fetching URL: https://topics-cdn.dell.com/pdf/boss-s-1_users-guide_en-us.pdfIt
Error message: 404 Client Error: Not Found for url: https://dl.dell.com/topics/pdf/boss-s-1_users-guide_en-us.pdfIt


 27%|██▋       | 83/313 [07:29<1:07:41, 17.66s/it]

Error fetching URL: https://downloads.dell.com/manuals/all-products/esuprt_ser_stor_net/esuprt_poweredge/poweredge-r510_...
Error message: 404 Client Error: Not Found for url: https://dl.dell.com/manuals/all-products/esuprt_ser_stor_net/esuprt_poweredge/poweredge-r510_...


 28%|██▊       | 89/313 [07:38<12:27,  3.34s/it]  

Error fetching URL: https://dell.to/3TlBIQk
Error message: 403 Client Error: Forbidden for url: https://kb.vmware.com/s/article/71367?linkId=185953550


 29%|██▉       | 91/313 [07:39<07:13,  1.95s/it]

Error fetching URL: https://www.dell.com/community/PowerEdge-Hardware-General/SNMP-OID-s-for-System-Board-Exhaust-Temper...
Error message: 404 Client Error: Not Found for url: https://www.dell.com/community/PowerEdge-Hardware-General/SNMP-OID-s-for-System-Board-Exhaust-Temper...


 31%|███▏      | 98/313 [07:53<05:24,  1.51s/it]

Error fetching URL: https://i.dell.com/sites/csdocuments/Shared-Content_data-Sheets_Documents/en/switch-brocade-815-825-...DELL-Dheeraj
Error message: 404 Client Error: Not Found for url: https://i.dell.com/sites/csdocuments/Shared-Content_data-Sheets_Documents/en/switch-brocade-815-825-...DELL-Dheeraj


 32%|███▏      | 101/313 [08:08<10:12,  2.89s/it]

Error fetching URL: https://www.dell.com/support/manuals/en-us/poweredge-fc630/fc630ownersmanual/system-board-jumper-set...https://www.dell.com/support/manuals/en-us/poweredge-fc630/fc630ownersmanual/disabling-a-forgotten-p...
Error message: 404 Client Error: Not Found for url: https://www.dell.com/support/manuals/en-us/poweredge-fc630/fc630ownersmanual/system-board-jumper-set...https://www.dell.com/support/manuals/en-us/poweredge-fc630/fc630ownersmanual/disabling-a-forgotten-p...


 34%|███▍      | 107/313 [08:15<05:58,  1.74s/it]

Error fetching URL: https://dl.dell.com/FOLDER07619118M/1/OM-SrvAdmin-Dell-Web-LX-10.2.0.0-4631_A00.tar.gz10.3.0.0:
Error message: 404 Client Error: Not Found for url: https://dl.dell.com/FOLDER07619118M/1/OM-SrvAdmin-Dell-Web-LX-10.2.0.0-4631_A00.tar.gz10.3.0.0:


In [1]:
df_context.head()

NameError: name 'df_context' is not defined

In [None]:
df_context.to_csv('df_context.csv', index=False)

In [None]:
# Create another new column with the length of the pulled text stored in 'Context'
df_context['Context_Length'] = df_context['Context'].apply(len)

### subset for testing

In [15]:
df_test = df_exploded.head(10).copy()

In [16]:
df_test.head()

Unnamed: 0,Questions,Answers,Context_url
0,"We provide you a variety of support related videos on our DELL EMC Support YouTube channel. We publish at least one new video every week so make sure you subscribe and stay up to date with the latest turoials, tipps and tricks about server, storage and networking. Here are some playlist you might find helpful for your daily business regarding DELL EMC Enterprise hardware.OpenManage Server Administrator RAID - Tutorials, Information and Troubleshooting Dell EMC QuickTips - something about everything iDRAC - Setup, Configuration, Troubleshooting Dell Lifecycle Controller SupportAssist Enterprise Virtual Edition You can find the full list here. Something is missing? Got a topic we should cover in one of our videos? Feel free to suggest new topics and give us feedback to existing ones in this thread.","Hi All,there are 2 new videos up on the channel. As always a Quick Tip Video on Service Tag locations and in addition we show you how to install OMSA on ESXi 6.7And don't forget to like the videos and subscribe to our channel for all the latest updates!",
1,"Hello,We were asked by DELL supporter to update iDRAC again for a hardware support, from 6.02.00.00 to 6.10.00.00. After updating, the secure web access to iDRAC failed with error:Bad RequestYour browser sent a request that this server could not understand.Additionally, a 400 Bad Request error was encountered while trying to use an ErrorDocument to handle the request.After doing some tests, I have the following conclusions/workarounds:1. the secure web access failed when using the FQDN of iDRAC interface2. using the IP or short hostname works with the secure web access3. setting iDRAC.WebServer.ManualDNSEntry to have the FQDN included won't solve the issue4. disabling idrac.webserver.HostHeaderCheck aslo works, but could not open virtual console viewer I tried to use racadm command 'sslresetcfg' to regenerate the certificate, but only short hostname used as Common Name (CN) and also only the short hostname listed in the DNS alternative name. By the way, we have DNS BMC/RAC Name and DNS Domain name correctly configured.It looks like to me a new bug in version 6.10.00.00. Though there are not issues with the IP and short hostname access, it is still annoying since we have FQDN defined and linked everywhere. Thanks,Di.","Indeed, this seems fixed in later version. I just updated one of our nodes to version 6.10.80.00, the issue described in my first post has gone. When I was testing iDRAC version 6.10.30.00, the issue was sill there.",
2,"After a long power outage, the accountant decided to turn on the server on her own.After pressing the button ""i"" and holding it for a little longer, she reset the settings iDRAC along with the license.How can I restore it now? The server was purchased in 2014. Service Tag <Service Tag was removed>. .","Hi, Sergei 66, iDRAC license was sent to you. Please ask me if you have any questions, Thank you,Maria JanuszkaDell | Social Outreach Services - EnterpriseMaria JSocial Media and Communities Professional Dell Technologies | Enterprise Support Services #Iwork4Dell Did I answer your query? Please click on ‘Accept as Solution’‘Kudo’ the posts you like!",
3,"Hi Dell Team,We are currently experiencing an issue with the ServiceTag on our Dell server. When attempting to enter the tag on Dell's website, an error message stating ""Service Tag or Product ID Search Error"" is displayed. could you please advise? Best regards!.","It works now, Thank you so much Young E.Have a great day.",
4,"Hello! I want to perform a memory upgrade on my R6515 server.Currently the server has 16x32GB DDR4-2400 RDIMMI bought 16x64GB DDR4-2400 LRDIMM. After the memory upgrade the server only contained these LRDIMM memory modules and nothing was mixed. They were the same brand, speed, type and so just like before.The issue I'm having is that when trying to boot the server with the new memory it doesn't get past the ""Please wait while system is initializing"". It's stuck there forever with no obvious errors.I've tried to pull out the pcie u.2 disks. Try booting with one power supply instead of two. Tried clearing the CMOS but nothing seemed to have solved the issue. If I remove the new ram and put the old ram back in again the system boots fine. It's not clear to me why it's not working because in the specifications the Poweredge R6515 supports both LRDIMM and RDIMM.","SDeltaE, That is the correct dimm, as you can see the part number I provided embedded under the Manufacturer Part number. To answer your question it doesn't appear that it would run at 3200 when fully populated, it looks to run at 2933 when more than 2 dimms per channel are installed, as seen on page 48 here. DELL-Chris HSocial Media and Communities ProfessionalDell Technologies | Enterprise Support ServicesDid I answer your query? Please click on ‘Accept as Solution’. ‘Kudo’ the posts you like!",
