# Lab 11: Проєктування системи перевірки адекватності текстових даних засобами машинного навчання.
### Шевченко Юлія, ФІ-31мн

In this project, we aim to design and implement a sophisticated system utilizing machine learning techniques to assess the adequacy of textual data. The goal is to analyze the effectiveness and reliability of various machine learning algorithms in discerning whether text data is suitable and appropriate for specific applications.

In [1]:
# All the imports for the task
import re

import numpy as np

In [2]:
# Function to preprocess text: convert to lowercase and remove punctuation
def preprocess_text(text):
    """
    Preprocesses text by converting it to lowercase and removing punctuation.

    Args:
    - text (str): The input text to be preprocessed.

    Returns:
    - str: Processed text with lowercase letters and no punctuation.
    """
    text = text.lower()  # Convert text to lowercase
    text = re.sub(r'[^a-zA-Z0-9\s]', '', text)  # Remove punctuation and special characters
    return text

In [3]:
# Function to count letter combinations of length n in text
def count_letter_combinations(text, n):
    """
    Counts occurrences of letter combinations of length n in the given text.

    Args:
    - text (str): The input text to analyze.
    - n (int): Length of the letter combinations.

    Returns:
    - dict: Dictionary where keys are letter combinations and values are their frequencies.
    """
    text = preprocess_text(text)
    combinations = {}
    for i in range(len(text) - n + 1):
        combination = text[i:i+n]
        if ' ' not in combination:  # Check if combination contains space
            if combination in combinations:
                combinations[combination] += 1
            else:
                combinations[combination] = 1
    return combinations

In [4]:
# Initialize variables
n = 2  # Length of letter combinations to analyze
text="""Artificial Intelligence (AI) and Robotics are two distinct yet increasingly interconnected fields that are revolutionizing numerous industries and aspects of daily life. While AI focuses on developing algorithms and computational models to simulate human intelligence and automate tasks, Robotics deals with the design, construction, operation, and application of robots. The synergy between AI and Robotics is leading to transformative advancements, enabling robots to perceive, learn, and interact with the environment and humans in more sophisticated ways. This paper explores the collaborative potential of AI and Robotics, highlighting how each field enhances the other's capabilities and impacts. Firstly, it examines AI-driven advancements in Robotics, showcasing how AI techniques such as machine learning, computer vision, and natural language processing are enabling robots to perform tasks with greater autonomy, adaptability, and efficiency. Examples include autonomous vehicles, robotic assistants in healthcare, and smart manufacturing systems. Secondly, the paper discusses Robotics as a platform for AI research and experimentation. Robots provide a physical embodiment for AI algorithms, allowing researchers to test and refine their models in real-world environments. This synergy has led to breakthroughs in AI, such as improved perception, navigation, and human-robot interaction. Furthermore, the paper explores collaborative projects and technologies that exemplify the mutual enhancement of AI and Robotics. These include AI-powered drones for search and rescue missions, robotic exoskeletons for rehabilitation, and collaborative robots (cobots) in industrial settings. These examples demonstrate how the integration of AI and Robotics is reshaping industries and expanding the possibilities of what robots can achieve. In conclusion, the collaborative potential of AI and Robotics is vast and promising. As both fields continue to advance, the synergy between them will lead to even more innovative applications and solutions. Understanding and harnessing this synergy is key to unlocking the full potential of AI and Robotics and driving forward the next wave of technological evolution. Artificial Intelligence (AI) and Robotics are two fields at the forefront of technological innovation, with each playing a significant role in shaping the future. AI focuses on creating intelligent machines capable of learning, reasoning, and problem-solving, while Robotics deals with the design, construction, operation, and application of robots. The intersection of these fields, known as AI Robotics or RoboAI, is where the true potential for collaboration lies. Collaboration in the context of AI and Robotics refers to the integration of AI technologies into robotic systems to enhance their capabilities. This collaboration enables robots to perceive and interact with their environment more effectively, make decisions autonomously, and adapt to changing circumstances. Importantly, collaboration between AI and Robotics is not just about combining two technologies; it is about creating synergies that amplify the strengths of each field. The importance of collaboration between AI and Robotics cannot be overstated. It has the potential to revolutionize industries such as manufacturing, healthcare, transportation, and agriculture, among others. By working together, AI and Robotics can tackle complex problems that neither field could solve alone, leading to more efficient processes, safer work environments, and improved quality of life. Thesis Statement: The collaborative potential of AI and Robotics holds great promise for advancing technology and solving complex problems. By leveraging the strengths of each field, we can create intelligent robotic systems that are capable of addressing the challenges of the 21st century.  Enhanced Autonomy: Future advancements in AI and robotics are expected to lead to robots with greater autonomy, allowing them to perform complex tasks with minimal human intervention. This includes robots that can learn from their experiences, adapt to new environments, and collaborate more effectively with humans and other robots. Human-Robot Interaction: Improving human-robot interaction is a key area for future advancements. This includes developing robots that can understand and respond to human emotions, intentions, and preferences, leading to more natural and intuitive interactions. Multi-Robot Systems: Future advancements may also include the development of multi-robot systems that can collaborate and coordinate their actions to achieve common goals. This could lead to more efficient and flexible robotic systems for tasks such as search and rescue, disaster response, and construction. Technical Challenges: Technical challenges include developing AI algorithms that can handle the complexities of real-world environments, such as uncertainty, variability, and dynamic changes. Additionally, ensuring the safety and reliability of AI-driven robotic systems remains a major challenge. Ethical Challenges: Ethical challenges include ensuring that AI-driven robotic systems are used in a fair and responsible manner, avoiding bias and discrimination. Additionally, there are concerns about the impact of automation on employment and the workforce, as well as privacy and security issues related to the use of AI and robotics in various applications.  Regulatory Challenges: Regulatory challenges include developing appropriate regulations and standards for the design, development, and deployment of AI-driven robotic systems. This includes addressing issues such as liability, accountability, and safety standards.  Learning and Adaptation: Further research is needed to develop AI algorithms that can learn and adapt in real-time to new environments and tasks. This includes research in areas such as reinforcement learning, transfer learning, and lifelong learning. Human-Robot Collaboration: Opportunities exist for further research in human-robot collaboration, including developing robots that can collaborate more effectively with humans and other robots. This includes research in areas such as shared autonomy, task allocation, and team coordination. In conclusion, collaborative AI and robotics hold great promise for the future, but also present significant challenges that must be overcome. By addressing these challenges and seizing opportunities for further research and development, we can unlock the full potential of AI-driven robotics and create a future where humans and robots collaborate seamlessly to improve our lives. In this exploration of the collaborative potential of AI and Robotics, several key points have emerged. Firstly, AI enhances robotics by improving perception, decision-making, and autonomy, leading to more capable and intelligent robots. Secondly, robotics provides a physical platform for AI research and experimentation, enabling the testing of AI algorithms in real-world scenarios. Thirdly, collaborative projects between AI and Robotics are driving innovation in various industries, from manufacturing to healthcare, leading to more efficient and safer work environments. Finally, ethical and societal implications, such as the impact on employment and workforce, the need for ethical guidelines, and ensuring safety and trust in collaborative systems, must be carefully considered. The collaborative potential of AI and Robotics is vast and promising. By leveraging the strengths of each field, we can create intelligent robotic systems that are capable of addressing the challenges of the 21st century. AI enhances robotics by providing advanced algorithms for perception, decision-making, and learning, while robotics provides a physical platform for AI research and experimentation. Together, AI and Robotics are driving innovation and transforming industries, leading to more efficient processes, safer work environments, and improved quality of life. To foster collaboration in AI and Robotics, researchers, developers, and policymakers must work together to address key challenges and seize opportunities for further research and development. This includes developing advanced AI algorithms for robotics, designing robots that can collaborate more effectively with humans, and ensuring ethical and responsible use of AI-driven robotic systems. By working together, we can unlock the full potential of AI and Robotics and create a future where humans and robots collaborate seamlessly to improve our lives."""

In [5]:
# Calculate letter combinations and their frequencies
combination_counts = count_letter_combinations(text, n)
print("Letter combinations of length", n, "and their frequencies:")
print(combination_counts)

Letter combinations of length 2 and their frequencies:
{'ar': 53, 'rt': 15, 'ti': 145, 'if': 10, 'fi': 22, 'ic': 74, 'ci': 13, 'ia': 18, 'al': 72, 'in': 160, 'nt': 82, 'te': 80, 'el': 40, 'll': 55, 'li': 40, 'ig': 16, 'ge': 26, 'en': 108, 'nc': 42, 'ce': 37, 'ai': 50, 'an': 164, 'nd': 115, 'ro': 109, 'ob': 73, 'bo': 99, 'ot': 87, 'cs': 33, 're': 107, 'tw': 8, 'wo': 14, 'di': 20, 'is': 41, 'st': 51, 'ct': 27, 'ye': 1, 'et': 21, 'cr': 8, 'ea': 65, 'as': 33, 'si': 33, 'ng': 96, 'gl': 1, 'ly': 19, 'er': 77, 'rc': 24, 'co': 51, 'on': 107, 'nn': 7, 'ne': 21, 'ec': 25, 'ed': 20, 'ie': 35, 'ld': 16, 'ds': 7, 'th': 132, 'ha': 45, 'at': 95, 'ev': 21, 'vo': 4, 'ol': 41, 'lu': 19, 'ut': 27, 'io': 65, 'ni': 21, 'iz': 4, 'zi': 2, 'nu': 5, 'um': 16, 'me': 32, 'ou': 14, 'us': 24, 'du': 6, 'tr': 19, 'ri': 42, 'es': 104, 'sp': 6, 'pe': 19, 'ts': 37, 'of': 39, 'da': 8, 'il': 14, 'fe': 17, 'wh': 7, 'hi': 28, 'le': 62, 'fo': 33, 'oc': 10, 'cu': 8, 'se': 39, 'de': 45, 've': 57, 'lo': 32, 'op': 19, 'pi': 8, 

In [6]:
# Function to evaluate a test text based on combination counts
def evaluation(test_text, combination_counts):
    """
    Evaluates the adequacy score of test_text based on precomputed combination counts.

    Args:
    - test_text (str): The text to evaluate.
    - combination_counts (dict): Dictionary of letter combinations and their frequencies.

    Returns:
    - float: Score indicating adequacy of the test text.
    """
    test_combinations = count_letter_combinations(preprocess_text(test_text), n)
    score = 1  # Initialize score
    for combination in test_combinations.keys():
        if combination in combination_counts:
            score += combination_counts[combination]
    return score

In [7]:
# Test data for evaluation
harmful = [
    "Hf djfnjdk skjdsnd jhhjd bn on tdsm skddo.",
    "Kdfd ksdnnj kjij onrnehsd lsdnk xoncbyeda.",
    "Odfjdf of jnsdg asdrvghsd kksdjn odnb chbdsgv erjnas.",
    "Jjfjdns in asdtvasd ertgf.",
    "Pkdsdks osdb of asdjbad qaasdpps."
]

correct = [
    "As the rental car rolled to a stop on the dark road, her fear increased by the moment.",
    "Be careful with that butter knife.",
    "He dreamed of leaving his law firm to open a portable dog wash.",
    "The crowd yells and screams for more memes.",
    "He decided to fake his disappearance to avoid jail.",
]

In [8]:
bag = []
for text in harmful + correct:
    bag.append(evaluation(text, combination_counts))

In [9]:
# Calculate threshold as the mean of all scores
threshold = np.mean(bag)
print("Threshold score:", threshold)

Threshold score: 735.5


In [10]:
# Function to make a decision based on a text's score compared to threshold
def make_decision(text, threshold=735.5):
    """
    Makes a decision based on the adequacy score of the text compared to a threshold.

    Args:
    - text (str): The text to evaluate.
    - threshold (float): Threshold score for decision-making.

    Returns:
    - tuple: (score, decision) where score is the adequacy score and decision is a boolean indicating if the score is above the threshold.
    """
    score = evaluation(text, combination_counts)
    return score, score > threshold

In [11]:
# Test texts for decision-making
test_texts = [
    "Acmksmcks znajnzja asdnc abdfcp skoog.",
    "Psand xbxt osnd over kmsjbn asnzo.",
    "Yisd zxlkzxp zlnzx lord eoty mxnjp zonk.",
    "Onzb york forjot donkwoq aongrot kmasdjz lsmdnaad pohuq akap.",
    "Yamzu sxok xuziko aoplot skmjbasdp asdnadjfo porbin axmxap awert.",
    "The glacier came alive as the climbers hiked closer.",
    "Homesickness became contagious in the young campers' cabin.",
    "I used to practice weaving with spaghetti three hours a day but stopped because I didn't want to die alone.",
    "Patricia found the meaning of life in a bowl of Cheerios.",
    "Stop waiting for exceptional things to just happen."
]

In [12]:
# Evaluate each test text and make a decision
for text in test_texts:
    print("\nText:", text)
    score, decision = make_decision(text)
    print("Score:", score)
    print("Decision:", decision)


Text: Acmksmcks znajnzja asdnc abdfcp skoog.
Score: 190
Decision: False

Text: Psand xbxt osnd over kmsjbn asnzo.
Score: 500
Decision: False

Text: Yisd zxlkzxp zlnzx lord eoty mxnjp zonk.
Score: 396
Decision: False

Text: Onzb york forjot donkwoq aongrot kmasdjz lsmdnaad pohuq akap.
Score: 746
Decision: True

Text: Yamzu sxok xuziko aoplot skmjbasdp asdnadjfo porbin axmxap awert.
Score: 715
Decision: False

Text: The glacier came alive as the climbers hiked closer.
Score: 909
Decision: True

Text: Homesickness became contagious in the young campers' cabin.
Score: 1456
Decision: True

Text: I used to practice weaving with spaghetti three hours a day but stopped because I didn't want to die alone.
Score: 2138
Decision: True

Text: Patricia found the meaning of life in a bowl of Cheerios.
Score: 1618
Decision: True

Text: Stop waiting for exceptional things to just happen.
Score: 1468
Decision: True


### Conclusion

In this project, we've developed a Python script utilizing Jupyter Notebook to assess the adequacy of textual data using machine learning principles. Here's a summary of what we've accomplished and why it is important:

1.  **Text Preprocessing and Analysis**: We implemented functions to preprocess text by converting it to lowercase and removing punctuation. This step ensures uniformity and eliminates irrelevant characters that could affect analysis accuracy.

2.  **Letter Combination Frequency Analysis**: The `count_letter_combinations` function computes the frequencies of letter combinations of a specified length (`n`) in the text. This analysis provides insights into patterns and characteristics within the text data.

3.  **Adequacy Evaluation**: We defined an `evaluation` function to assess the adequacy of test texts based on their similarity to a reference text in terms of letter combination frequencies. This evaluation is crucial for tasks such as plagiarism detection or verifying the integrity of textual content.

4.  **Decision-Making**: Using a predefined threshold derived from the mean scores of known harmful and correct texts, the `make_decision` function determines whether a given text meets the adequacy criteria. This decision-making process can automate quality control measures in various applications, such as content filtering or authenticity validation.

5.  **Significance**: This project showcases the practical application of machine learning concepts in text analysis. By automating the assessment of textual adequacy, our tool can assist researchers, educators, and content moderators in ensuring the integrity, originality, and appropriateness of text-based data across different domains.

6.  **Future Directions**: Further enhancements could include integrating more sophisticated machine learning models for text classification or expanding the analysis to handle larger datasets and more diverse linguistic features.

In conclusion, this project demonstrates how Python, Jupyter Notebook, and machine learning techniques can be leveraged to create a robust tool for evaluating text data adequacy. By automating this process, we facilitate efficient decision-making and enhance the reliability of textual content analysis in various applications.