#Testes de Nivelamento - Estágio Desenvolvimento de Software - Intuitive Care

Author: Victor Mafra de Holanda Ferraz \\
email: vmhf@ic.ufal.br \\
links: [LinkedIn](https://www.linkedin.com/in/victor-mafra-de-holanda-ferraz-b7a813200/) [github](https://github.com/MafraV)

##Teste 1 - WebScraping

###Imports

In [17]:
import requests
from bs4 import BeautifulSoup

###Functions Definition

In [18]:
#Function that finds a specific link in a html page using two sub-strings that are present in the target url as parameters
def find_link(url, condition1, condition2):
    read = requests.get(url) #Reads the url
    html_content = read.content #Gets the content from the html page
    soup = BeautifulSoup(html_content, "html.parser") #Creates a BeautifulSoup object with the html content
    l = soup.find_all('a') #Finds all the <a> tags in the html page code
    for link in l: #Goes through all the <a> tags found previously
      new_link = link.get('href') #Gets only the link that is inside the current <a> tag
      if (condition1 in new_link) and (condition2 in new_link): #Tests if the condition1 and condition2 sub-strings are present in the current link
        return new_link #Returns the target url

#Function that downloads a pdf file from a url page
def download_pdf(url, path):
    response = requests.get(url) #Reads the url
    with open(path, 'wb') as f: #Open the PDF file
      done = f.write(response.content) #Write the content of the PDF to the path
      if done: print('PDF downloaded with success!') #If the PDF is successfully downloaded, prints a message to inform it
      else: print('PDF download failed!') #Else, prints a message to inform that the download have failed

###Main

In [19]:
url = "https://www.gov.br/ans/pt-br/assuntos/prestadores/padrao-para-troca-de-informacao-de-saude-suplementar-2013-tiss" #url given in the task

tiss_page_url = find_link(url, 'tiss', '2021') #Finds the link to the latest 'Padrão Tiss' page using 'tiss' and '2021' as parameters

pdf_url = find_link(tiss_page_url, 'tiss', '.pdf') #Finds the link to the 'Componente Organizacional' PDF file using 'tiss' and '.pdf' aas parameters

download_pdf(pdf_url, 'padrao-tiss_componente-organizacional_202111.pdf') #Downloads the PDF to the LatestPadraoTiss folder

PDF downloaded with success!


##Teste 2 - Data Transformation

###Imports

In [7]:
!pip install tabula-py

Collecting tabula-py
  Downloading tabula_py-2.3.0-py3-none-any.whl (12.0 MB)
[K     |████████████████████████████████| 12.0 MB 4.4 MB/s 
Collecting distro
  Downloading distro-1.6.0-py2.py3-none-any.whl (19 kB)
Installing collected packages: distro, tabula-py
Successfully installed distro-1.6.0 tabula-py-2.3.0


In [20]:
import pandas as pd
from zipfile import ZipFile
import tabula

###Functions Definition

In [21]:
#Function that builds a structured pandas DataFrame from a dictionary receiving the disctionary and the key to the data as parameters
def build_table_from_dict(d,key):
    code, desc = [], []

    #Save each value from the discionary at two separated lists, that represents the two columns of the final DataFrame
    for value in d[key].values():
      splitted = value.split() #Split the value into two separeted strings, because the intire row came merged into a single string
      code.append(splitted[0]) #Save the 'Código' value at the code list
      desc.append(splitted[1]) #Save the 'Descrição da categoria' value at the disc list

    final_d = {'Código': code, #Creates a structures disctionary with the separated lists
               'Descrição da categoria': desc}

    df = pd.DataFrame(final_d, columns=['Código', 'Descrição da categoria']) #Create the DataFrame from the structured dictionary

    return df

#Function that builds a structured pandas DataFrame from a list of DataFrame objects
def build_table_from_tables(tables):
    code, desc = [], []

    d = tables[0].to_dict() #Transform the first table of the list into a dictionary

    #Save each value from the two discionaries at two separated lists, that represents the two columns of the final DataFrame
    for (value1,value2) in zip(d['Tabela de Categoria do Padrão TISS'].values(),d['Unnamed: 0'].values()):
        desc.append(value1) #Save the 'Código' value at the code list
        code.append(value2) #Save the 'Descrição da categoria' value at the disc list
        
    del desc[0] #Remove the header of the table from the desc list
    del code[0] #Remove the header of the table from the code list

    #Save the values from all the remaining tables into the two lists
    for table in tables[1:]:
      d = table.to_dict() #Transform the table into a dictionary
      header = list(d.keys()) #Get the header of the table
      code.append(int(header[0])) #Save the header to the two lists because the 'Quadro 31' goes through many pages, 
      desc.append(header[1])      #so each page was considered a separated table and the first row was considered the header
      #Save each value from the first dictionary to the code list
      for value in table[header[0]]:
        code.append(value)
      #Save each value from the second dictionary to the disc list
      for value in table[header[1]]:
        if '\r' in value: #Checks if there are line break at the string
          value = value.replace('\r', ' ') #Replace the line brack special characters to a blank space
        desc.append(value)

    final_d = {'Código': code, #Creates a structures disctionary with the separated lists
               'Descrição da categoria': desc}

    df = pd.DataFrame(final_d,columns=['Código', 'Descrição da categoria']) #Create the DataFrame from the structured dictionary

    return df

#Function that saves the pandas DataFrame into a CSV file at the desired path
def save_csv(df, path):
    df.to_csv(path)
    print('CSV successfully saved at path: '+path)

#Function that compress the CSVs files into a ZIP file at the desired path
def zip_csvs(zip_path, csv_path):
    zip = ZipFile(zip_path,'w')
    for path in csv_path:
      zip.write(path)
    zip.close()
    print('CSVs successfully zipped at path: '+zip_path)

###Main

In [22]:
pdf_file = "padrao-tiss_componente-organizacional_202111.pdf" #PDF file path

#Reads the tables that are in the pages 114 to 120, where 'Quadro 30' is in page 114, 
#'Quadro 31' goes through page 115 to page 120 and 'Quadro 32' is in page 120 
tables = tabula.read_pdf(pdf_file, pages=(114,115,116,117,118,119,120), multiple_tables=True) 

d_30 = tables[0].to_dict() #Transforms the first table of the list into a dictionary
del d_30['Tabela de Tipo do Demandante'][0] #Removes the header from the dictionary

d_32 = tables[-1].to_dict() #Transforms the last table of the list into a dictionary
del d_32['Tabela de Tipo de Solicitação'][0] #Removes the first header from the dictionary
del d_32['Tabela de Tipo de Solicitação'][1] #Removes the second header from the dictionary
del d_32['Tabela de Tipo de Solicitação'][4] #Removes a None value from the dictionary

df_30 = build_table_from_dict(d_30,'Tabela de Tipo do Demandante') #Calls the function that build the table of 'Quadro 30'
df_31 = build_table_from_tables(tables[1:-1]) #Calls the function that build the table of 'Quadro 31'
df_32 = build_table_from_dict(d_32,'Tabela de Tipo de Solicitação') #Calls the function that build the table of 'Quadro 32'

Got stderr: Dec 09, 2021 10:10:02 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>
Dec 09, 2021 10:10:04 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>
Dec 09, 2021 10:10:05 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>
Dec 09, 2021 10:10:06 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>



In [23]:
display(df_30)

Unnamed: 0,Código,Descrição da categoria
0,1,Operadora
1,2,Prestador
2,3,Consumidor
3,4,Gestor
4,5,ANS


In [24]:
display(df_31)

Unnamed: 0,Código,Descrição da categoria
0,1,Componente Organizacional
1,2,Componente de Conteúdo e Estrutura
2,3,Componente de Representação de Conceitos em Saúde
3,4,Componente de Comunicação
4,5,Componente de Segurança e Privacidade
...,...,...
133,164,Guia de resumo de internação
134,165,Guia de serviços profissionais/serviço auxilia...
135,166,Guia de solicitação de internação
136,167,Guia de solicitação de prorrogação de internaç...


In [25]:
display(df_32)

Unnamed: 0,Código,Descrição da categoria
0,1,Alteração
1,2,Inclusão
2,3,Exclusão


In [26]:
dfs=[df_30,df_31,df_32] #Creates a list with all the returned DataFrames
csv_paths=['Tabela_de_tipo_do_Demandante.csv','Tabela_de_categoria_do_Padrão_TISS.csv','Tabela_de_tipo_de_Solicitação.csv'] #Creates a list with all the csv desired paths

#Save each DataFrame to its related path
for (df,path) in zip(dfs,csv_paths):
    save_csv(df,path) #Call the function that saves the DataFrame as a CSV file at the desired path

zip_csvs('Teste_{Victor Mafra de Holanda Ferraz}.zip', csv_paths) #Calls the function that zip the CSVs files to a desired path

CSV successfully saved at path: Tabela_de_tipo_do_Demandante.csv
CSV successfully saved at path: Tabela_de_categoria_do_Padrão_TISS.csv
CSV successfully saved at path: Tabela_de_tipo_de_Solicitação.csv
CSVs successfully zipped at path: Teste_{Victor Mafra de Holanda Ferraz}.zip
