In [1]:
pip install nltk

Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install pdfplumber

Note: you may need to restart the kernel to use updated packages.


In [3]:
import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from itertools import combinations
import pdfplumber

In [4]:
# Download required resources for NLTK
nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\ADMIN\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\ADMIN\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [5]:
def extract_text_from_pdf(pdf_path):
    with pdfplumber.open(pdf_path) as pdf:
        text = ""
        for page in pdf.pages:
            text += page.extract_text()
    return text

In [6]:
def get_mca_questions(context: str):
    # Split the context into sentences
    sentences = sent_tokenize(context)

    # Tokenize each sentence into words
    sentences_tokens = [word_tokenize(sentence) for sentence in sentences]

    # Remove stopwords and punctuation
    stop_words = set(stopwords.words('english'))
    cleaned_sentences = []
    for sentence_tokens in sentences_tokens:
        cleaned_sentence = [word.lower() for word in sentence_tokens if word.isalpha() and word.lower() not in stop_words]
        cleaned_sentences.append(cleaned_sentence)

    # Generate combinations of cleaned sentences to create questions
    mca_questions = []
    for i, j in combinations(range(len(cleaned_sentences)), 2):
        q1 = " ".join(cleaned_sentences[i])
        q2 = " ".join(cleaned_sentences[j])
        mca_question = f"Q: Which of the following are true about {q1}?"
        mca_question += f"\na. {q1}"
        mca_question += f"\nb. {q2}"
        mca_questions.append(mca_question)

    return mca_questions


In [7]:
pdf_path = "chapter-2.pdf"
pdf_text = extract_text_from_pdf(pdf_path)
mca_questions = get_mca_questions(pdf_text)

In [8]:
# Save the MCQs in a Python (.py) file
with open("mca_questions.py", "w") as py_file:
    py_file.write("mca_questions = [\n")
    for idx, question in enumerate(mca_questions):
        py_file.write(f'    "Q{idx+1}: {question}",\n')
    py_file.write("]\n")

Step 1: Import required libraries
- The code starts by importing the necessary libraries and modules.
- `nltk` is the Natural Language Toolkit, used for text processing and tokenization.
- `pdfplumber` is the library we'll use for PDF text extraction.
- `itertools.combinations` is imported for generating combinations of elements.

Step 2: Download NLTK resources
- The code then downloads the required NLTK resources for text processing. These resources include tokenizers, stopwords, etc.

Step 3: Define the function `extract_text_from_pdf(pdf_path)`
- This function takes the file path of a PDF as input and returns the extracted text from the PDF.
- `pdfplumber.open(pdf_path)` opens the PDF file using `pdfplumber` and returns a PDF object.
- The `with` statement ensures that the PDF object is properly closed after processing.
- The function iterates through each page of the PDF using `pdf.pages`.
- For each page, it extracts the text using `page.extract_text()` and appends it to the `text` variable.
- Finally, the function returns the concatenated text from all the pages.

Step 4: Define the function `get_mca_questions(context: str)`
- This function takes a text context as input and generates multiple-choice questions based on the content.
- The rest of the function contains the same code as before, which generates MCA questions based on the provided context. The details of the MCA question generation are not shown in the code snippet you provided, but I assume it involves some text processing and analysis to create the questions.

Step 5: Example Usage
- The code demonstrates an example usage of the functions.
- It sets the `pdf_path` variable to the path of the PDF file you want to process.
- It then calls the `extract_text_from_pdf` function with `pdf_path` as input, which extracts the text from the PDF.
- The extracted text is then passed to the `get_mca_questions` function, which generates multiple-choice questions based on the text context.
- Finally, it prints the generated questions using a loop.

Overall, the code performs the following tasks:
1. Imports necessary libraries.
2. Downloads NLTK resources.
3. Defines a function to extract text from a PDF file using `pdfplumber`.
4. Defines a function to generate multiple-choice questions based on a given text context (though the details of question generation are not provided in the snippet).
5. Demonstrates example usage by extracting text from a PDF and generating multiple-choice questions based on that text.