Skip to content

Conversation

@pedruck
Copy link

@pedruck pedruck commented Apr 29, 2025

Foram implementadas as seguintes mudanças:

  • Foi feita a comparação entre as informações puxadas da database scripts e as informações obtidas pelo sheet 2025.1 disponibilizado. A partir dessa comparação é atribuido um novo parametro booleano na collections "teacher" denominado "ACTIVE" que informa se o professor está ativo (true) ou não ativo (false) (informa false caso um professor que estava previamente na database não esta presente no novo sheet 2025.1)

  • Adicionado novos parametros na collection "offers" que determina se a oferta ja estava na database (atribuindo as tags ano: 2024 e semestre 2) ou caso ela veio do novo sheet 2025.1 (atribuindo as tags ano: 2025 e semestre: 1)

  • Codigo foi compactado de forma que possa ser executado apartir de um só arquivo (main.py)

OBS: A database que é utilizada para comparação com o sheet e que depois é atualizada é definida no .env (pode ser "scripts" ou "shared-resources")

  • as unicas collections atualizadas são:
  1. offers e teachers (atualizadas conforme descrito acima)
  2. disciplines (somente atualizada conforme as informações obtidas no sheet)

-Logica de execução:

  1. JSONS obtidos pelo sheet ficam no diretorio jsonfiles
  2. JSONS obtidos pela database ficam no diretorio oldjsonfiles
  3. JSONS obtidos atraves do processamento, atualização e comparação de dados entre os respectivos JSONS desses diretorios acima e que serão posteriormente mandados pra database de escolha (script ou shared-resources) ficam no diretorio comparedjsonfiles

pedruck added 4 commits April 27, 2025 17:39
comparação entre professores ativos e não ativos feita. (professores ativos terão o STATUS : ACTIVE enquanto os não ativos terão o STATUS: DEACTIVE)
codigo ainda incompleto em algumas partes
preparando para a pr
@pedruck pedruck requested a review from danrleypereira April 29, 2025 04:27
@GuiMcs00 GuiMcs00 requested review from Copilot and removed request for danrleypereira May 3, 2025 19:23
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request implements data comparison and normalization changes to support shared resource updates, integrating information from both database scripts and an updated sheet. Key changes include:

  • Refactoring discipline normalization into a new class structure.
  • Orchestrating JSON generation, comparison, and upload via a consolidated main.py.
  • Adjusting file paths and parameter values (e.g., year/semester) to align with the new sheet for offers and teachers.

Reviewed Changes

Copilot reviewed 36 out of 49 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
processamento_planilha/normalize.py Introduces a NormalizeDisciplinas class to encapsulate discipline normalization with an updated file path.
processamento_planilha/main.py Adds a main orchestrator that sequentially calls JSON generation, comparison, normalization, and upload.
processamento_planilha/load_sheet_to_mongodb.py Updates file path and adjusts static year/semester values for sheet data.
processamento_planilha/get_old_collections_from_mongo.py Extracts collections from MongoDB with minor formatting improvements.
processamento_planilha/extract_collections.py Provides comprehensive data extraction and JSON generation from an Excel sheet.
processamento_planilha/comparando_jsons.py Implements the ComparisonModule for merging JSON data with deactivation logic.
extract_collections.py Removes duplicate extraction logic to consolidate functionality.
Files not reviewed (13)
  • .env.example: Language not supported
  • Dockerfile: Language not supported
  • new_collection/CAMPUS.json: Language not supported
  • new_collection/PERIODOS.json: Language not supported
  • new_collection/SALAS.json: Language not supported
  • processamento_planilha/comparedjsonfiles/CAMPUS.json: Language not supported
  • processamento_planilha/comparedjsonfiles/CURSOS.json: Language not supported
  • processamento_planilha/comparedjsonfiles/PERIODOS.json: Language not supported
  • processamento_planilha/jsonfiles/CAMPUS.json: Language not supported
  • processamento_planilha/jsonfiles/CURSOS.json: Language not supported
  • processamento_planilha/jsonfiles/PERIODOS.json: Language not supported
  • processamento_planilha/normalize.py~: Language not supported
  • processamento_planilha/oldjsonfiles/CAMPUS.json: Language not supported

Comment on lines +46 to +48
class NormalizeDisciplinas:
@staticmethod
def normalize_disciplinas():
Copy link

Copilot AI May 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider adding a docstring for the NormalizeDisciplinas class and its normalize_disciplinas method to clearly describe its purpose and expected behavior.

Suggested change
class NormalizeDisciplinas:
@staticmethod
def normalize_disciplinas():
class NormalizeDisciplinas:
"""
A class responsible for normalizing and updating the JSON file containing discipline data.
It processes the data by grouping documents with the same `_id` and consolidating their
`COD_CURS` values into a unique list.
"""
@staticmethod
def normalize_disciplinas():
"""
Normalizes the discipline data in the JSON file.
This method performs the following steps:
1. Loads the discipline data from the JSON file.
2. Groups documents by `_id` and consolidates their `COD_CURS` values.
3. Saves the updated data back to the JSON file.
"""

Copilot uses AI. Check for mistakes.

# Load the Excel file
file_path = 'sheet.xlsx' # Replace with your actual file path
file_path = '../sheet.xlsx' # Replace with your actual file path
Copy link

Copilot AI May 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Using a relative path here may lead to issues if the script is executed from an unexpected directory; consider using a configurable base directory or an absolute path to ensure reliable file access.

Suggested change
file_path = '../sheet.xlsx' # Replace with your actual file path
file_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), '../sheet.xlsx') # Construct absolute path

Copilot uses AI. Check for mistakes.
Comment on lines +55 to +60
for offer in newjson:
offer['ANO'] = 2025
offer['SEMESTRE'] = 1

resultjson = newjson + oldjson

Copy link

Copilot AI May 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The merging of new and old offer JSON data in GenerateComparedOffers could result in duplicate entries if an offer exists in both; consider implementing a de-duplication step before writing the result.

Suggested change
for offer in newjson:
offer['ANO'] = 2025
offer['SEMESTRE'] = 1
resultjson = newjson + oldjson
# Update new offers with the specified year and semester
for offer in newjson:
offer['ANO'] = 2025
offer['SEMESTRE'] = 1
# Use a dictionary to de-duplicate offers based on a unique key
offers_dict = {}
for offer in newjson + oldjson:
unique_key = f"{offer['ID']}" # Replace 'ID' with the actual unique field(s)
offers_dict[unique_key] = offer
# Convert the dictionary values back to a list
resultjson = list(offers_dict.values())
# Write the de-duplicated result to the output file

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants