# Term Search in Documents

## Objective
The goal of this exercise is to develop a simple information retrieval system that allows the user to search for a specific term across a set of text documents. This will introduce you to the basics of text processing and searching algorithms in the context of information retrieval.

## Problem Description
You are provided with a set of text documents. Your task is to implement a search function that:
- Takes a user-inputted term as the query.
- Searches for this term across all the provided documents.
- Returns a list of documents where the term appears.

## Requirements

### Step 1: Preparing the Data
- **Load the Documents**: You will start by loading the text documents into your program. These documents can be in plain text format stored in a directory.
- **Read Each Document**: Implement a function to read each document and store its contents in a data structure of your choice (e.g., a list).

### Step 2: Implementing the Search
- **Input Query**: Implement a function to accept a query term from the user.
- **Search Function**: Create a function that:
  - Iterates through each document.
  - Checks if the query term appears in the document.
  - You may choose to implement case-insensitive search to improve user experience.
- **Return Results**: The function should return the names or identifiers of the documents where the term is found.

### Step 3: Displaying Results
- **Output the Results**: For each search query, output the results in a user-friendly format, listing the documents where the term was found, or a message indicating that the term does not appear in any document.

## Evaluation Criteria
- **Correctness**: The search function should accurately identify documents containing the term.
- **Efficiency**: While efficiency may not be critical for small datasets, consider the efficiency of your search algorithm.
- **Usability**: The interface for inputting search terms and viewing results should be clear and easy to use.

## Additional Challenges (Optional)
- **Enhance the search functionality**: Allow for more complex queries, such as phrases or multiple terms.
- **Improve the search with regular expressions**: Use regex for pattern matching to enhance the flexibility of the search.
- **Implement a simple ranking system**: Rank the documents based on the frequency of the term within each document.

This exercise will help you understand the fundamental mechanisms behind storing and retrieving data in the field of information retrieval. By the end of this task, you will have a basic prototype that mimics core functions of larger, more complex search engines.


In [None]:
import os

In [None]:
def buscar_palabra_en_archivo(archivo, palabra):
    with open(archivo, 'r', encoding='utf-8') as f:
        for num_linea, linea in enumerate(f, start=1):
            if palabra in linea:
                return True
    return False

In [None]:
def buscar_palabra_en_carpeta(ruta_carpeta, palabra):
    archivos_con_palabra = []
    for nombre_archivo in os.listdir(ruta_carpeta):
        archivo = os.path.join(ruta_carpeta, nombre_archivo)
        if os.path.isfile(archivo):
            if buscar_palabra_en_archivo(archivo, palabra):
                archivos_con_palabra.append(nombre_archivo)
    return archivos_con_palabra

In [None]:
carpeta = '../data'
palabra_a_buscar = input("Ingrese la palabra a buscar: ")

In [None]:
archivos_encontrados = buscar_palabra_en_carpeta(carpeta, palabra_a_buscar)
if archivos_encontrados:
    print(f"La palabra '{palabra_a_buscar}' fue encontrada en los siguientes archivos:")
    for archivo in archivos_encontrados:
        print('-',archivo)
    print(f"La palabra '{palabra_a_buscar}' fue encontrada en {len(archivos_encontrados)} archivos:")

else:
    print(f"La palabra '{palabra_a_buscar}' no fue encontrada en ninguno de los archivos en la carpeta '{carpeta}'.")

La palabra 'hola' fue encontrada en los siguientes archivos:
- pg100.txt
- pg10676.txt
- pg1184.txt
- pg120.txt
- pg1259.txt
- pg1260.txt
- pg1400.txt
- pg145.txt
- pg1727.txt
- pg174.txt
- pg18893.txt
- pg1998.txt
- pg2000.txt
- pg205.txt
- pg21012.txt
- pg2160.txt
- pg21700.txt
- pg25344.txt
- pg2554.txt
- pg2591.txt
- pg2600.txt
- pg26073.txt
- pg2641.txt
- pg2701.txt
- pg27827.txt
- pg28054.txt
- pg2814.txt
- pg2852.txt
- pg29728.txt
- pg30254.txt
- pg3207.txt
- pg345.txt
- pg37106.txt
- pg408.txt
- pg4085.txt
- pg41070.txt
- pg41287.txt
- pg42933.txt
- pg4300.txt
- pg44388.txt
- pg44837.txt
- pg45.txt
- pg45540.txt
- pg45848.txt
- pg47312.txt
- pg47629.txt
- pg47948.txt
- pg48191.txt
- pg50038.txt
- pg514.txt
- pg5197.txt
- pg52882.txt
- pg59468.txt
- pg59469.txt
- pg6130.txt
- pg61419.txt
- pg62119.txt
- pg64317.txt
- pg6593.txt
- pg6761.txt
- pg73447.txt
- pg73448.txt
- pg74.txt
- pg768.txt
- pg844.txt
- pg8800.txt
- pg98.txt
- pg996.txt
La palabra 'hola' fue encontrada en 68 ar