In [1]:
# Install the Hugging Face Transformers library and its dependencies
! pip install -U transformers
! pip install -U datasets
! pip install -U sentencepiece

# Import the necessary modules
import os
import json
import pandas as pd
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, QuestionAnsweringPipeline

# Load the transcripts from the csv file named "cleaned_output_file"
transcripts = pd.read_csv('cleaned_output_file.csv')

# Convert the transcripts dataframe to a dictionary with id as the key and text as the value
transcripts = transcripts.set_index('id')['text'].to_dict()

# Load the test data from the csv file named "test.csv"
test = pd.read_csv('test.csv')

# Load the pre-trained tokenizer and model for question answering
# You can choose any model from https://huggingface.co/models?filter=question-answering
# For example, you can use bert-large-uncased-whole-word-masking-finetuned-squad
tokenizer = AutoTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
model = AutoModelForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")

# Create a question answering pipeline using the tokenizer and model
qa_pipeline = QuestionAnsweringPipeline(model=model, tokenizer=tokenizer)

# Initialize an empty list to store the answers
answers = []

# Loop over all the rows in the test data
for _, row in test.iterrows():
  # Get the question and the transcript id from the row
  question = row['Question']
  transcript_id = row['Transcript']

  # Get the transcript text from the transcripts dictionary
  transcript = transcripts[transcript_id]

  # Use the question answering pipeline to generate an answer based on the question and the transcript
  answer = qa_pipeline(question=question, context=transcript)

  # Append the answer and the question id to the answers list
  answers.append([answer, row['Id']])

# Convert the answers list to a pandas dataframe
answers = pd.DataFrame(answers, columns=["Text", "Id"])

# Save the answers dataframe to a csv file
answers.to_csv("sample_submission_8nov.csv", index=False)

Collecting datasets
  Obtaining dependency information for datasets from https://files.pythonhosted.org/packages/7c/55/b3432f43d6d7fee999bb23a547820d74c48ec540f5f7842e41aa5d8d5f3a/datasets-2.14.6-py3-none-any.whl.metadata
  Downloading datasets-2.14.6-py3-none-any.whl.metadata (19 kB)
Collecting xxhash (from datasets)
  Obtaining dependency information for xxhash from https://files.pythonhosted.org/packages/b7/3a/74a609706ef4430fe6d041a3b8d209882c15440b695e373fe26d48c6f35c/xxhash-3.4.1-cp311-cp311-win_amd64.whl.metadata
  Downloading xxhash-3.4.1-cp311-cp311-win_amd64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Obtaining dependency information for multiprocess from https://files.pythonhosted.org/packages/e7/41/96ac938770ba6e7d5ae1d8c9cafebac54b413549042c6260f0d0a6ec6622/multiprocess-0.70.15-py311-none-any.whl.metadata
  Downloading multiprocess-0.70.15-py311-none-any.whl.metadata (7.2 kB)
Collecting dill<0.3.8,>=0.3.0 (from datasets)
  Obtaining dependency informatio

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


Downloading (…)okenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


Downloading (…)lve/main/config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

ImportError: 
AutoModelForQuestionAnswering requires the PyTorch library but it was not found in your environment. Checkout the instructions on the
installation page: https://pytorch.org/get-started/locally/ and follow the ones that match your environment.
Please note that you may need to restart your runtime after installation.
