## <span style="color:yellow; font-weight:bold; font-family:Times, sans-serif">Library Seekers</span>

#### <span style="color:yellow; font-weight:bold; font-family:Times, sans-serif">Introdução</span>

Neste Jupyter Notebook, o objetivo foi extrair informações das bases tratadas anteriormente, elencando os leitores mais significativos para as regras de negócio escolhidas. Aqui fica livre para explorar diferentes perfis de leitores, aqueles mais rápidos na leitura, os que mais constroem comentários, e assim por diante. Para que a empresa, caso queira, identifique os mais relevantes, podendo estabelecer parcerias ou algo que possa ser usado para o marketing de determinado livro, por exemplo.

#### <span style="color:yellow; font-weight:bold; font-family:Times, sans-serif">Instalação</span>
Para executar este notebook, você precisará das seguintes bibliotecas e dependências:

```python
!pip install pandas

### 1.0 - Importações e configurações

In [234]:
import pandas as pd

In [235]:
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

### 2.0 - Bases

In [236]:
bd = pd.read_parquet(r'..\\..\\Desafio_A3Data\\Databases\\work_books_data.parquet', engine='pyarrow')
br = pd.read_parquet(r'..\\..\\Desafio_A3Data\\Databases\\work_books_rating.parquet', engine='pyarrow')

In [237]:
print(bd.shape)
bd.head(2)

(38, 16)


Unnamed: 0,Title,description,authors,image,previewLink,publisher,publishedDate,infoLink,categories,ratingsCount,Price,score,time,readersRatings,totalRatings,summaryDescription
4553,Death in the Afternoon,Still considered one of the best books ever wr...,Ernest Hemingway,http://books.google.com/books/content?id=AdFQA...,http://books.google.nl/books?id=AdFQAQAAQBAJ&p...,Simon and Schuster,2014,https://play.google.com/store/books/details?id...,Literary Criticism,9.0,,3.96,308.964,100,109.0,Death in the Afternoon is an impassioned look ...
9230,The Geographer's Library,Item 1: An alembic is the top part of an appar...,Jon Fasman,http://books.google.com/books/content?id=H5ogA...,http://books.google.nl/books?id=H5ogAQAAIAAJ&q...,Penguin Press HC,2005,http://books.google.nl/books?id=H5ogAQAAIAAJ&d...,Fiction,33.0,,3.06,324.26232,100,133.0,Item 1: An alembic is the top part of an appar...


In [238]:
print(br.shape)
br.head(2)

(3800, 13)


Unnamed: 0,Id,Title,Price,User_id,profileName,score,time,summary,text,summary_posit_score,summary_neut_score,summary_neg_score,summary_score
66533,B000JWW1UG,Death in the Afternoon,,A1ZA12IECEKIY,Mark Cohen,4.0,376.68,Man vs Beast and the Birth of Civilization,Bullfighting was born in Spain many centuries ...,0.531892,0.185069,0.283039,positivo
66534,B000JWW1UG,Death in the Afternoon,,A2UO3C0HLKCZ4F,R. Huitron,4.0,376.32,if one is born outside of bullfighting culture...,A lot of times people end up hating something ...,0.190422,0.166635,0.642942,negativo


### 3.0 - Desenvolvimento

In [239]:
# Cálculo das médias de 'score' e 'time'
mean_br = br.groupby('User_id').agg({'score': 'mean', 'time': 'mean'}).reset_index()

# Contagem de 'positivo' e 'negativo' em 'summary_score'
count_br = br.groupby(['User_id', 'summary_score']).size().unstack(fill_value=0).reset_index()

# Renomeando as colunas para uma leitura mais fácil
count_br.columns.name = None
count_br = count_br.rename(columns={'positivo': 'positive_count', 'negativo': 'negative_count'})

# Juntando os dois DataFrames
result_br = pd.merge(mean_br, count_br, on='User_id')

#Obtendo quantidade total de avaliações por leitor
result_br['total_count'] = result_br['negative_count'] + result_br['positive_count'] 

result_br.head()

Unnamed: 0,User_id,score,time,negative_count,positive_count,total_count
0,A08561391R9ZTNTDI35MO,5.0,377.04,0,1,1
1,A103U0Q3IKSXHE,4.0,263.64,0,1,1
2,A107C4RVRF0OP,4.0,264.288,0,1,1
3,A10ALALG39VWNJ,5.0,254.376,0,1,1
4,A10BG7HG8422UK,5.0,367.344,0,2,2


In [240]:
#Esses dataframes foram criados para as célular abaixo, trazendo os leitores mais significativos de acordo com as regras escolhidas
sortByEvaluations = result_br.sort_values(by='total_count', ascending=False)
sortByReadingTime = result_br.sort_values(by='time', ascending=True)
sortByScore = result_br.sort_values(by='score', ascending=False)

In [241]:
#Função usada para obter os nomes dos leitores e os respectivos livros lidos por eles
def get_readers(selected_dataset):
    readers = selected_dataset.merge(br[['User_id', 'profileName', 'Title']], on='User_id', how='inner')
    grouped_readers = readers[['User_id', 'profileName', 'Title']].groupby(['User_id', 'profileName'])['Title'].apply(lambda x: '; '.join(set(x))).reset_index()
    
    return grouped_readers


In [242]:
#Leitores que mais avaliam
trashold = 5
selected = sortByEvaluations[sortByEvaluations['total_count'] >= trashold][:10]
readers = get_readers(selected)
readers

Unnamed: 0,User_id,profileName,Title
0,A114YQ7ZT9Y1W5,Steiner,Death in the Afternoon; The Wapshot Chronicle
1,A1K1JW1C5CUSUZ,"Donald Mitchell ""Jesus Loves You!""",Death in the Afternoon; The Wapshot Chronicle
2,A1L43KWWR05PCS,Lawyeraau,Don't Pee on My Leg and Tell Me It's Raining: ...
3,A2AYSFGUP5VTY3,J. Smallridge,The Wapshot Chronicle; Member of the Wedding; ...
4,AA0BB36HOBYLC,"Deborah Dieleman ""Jen D""",A Searching Heart (Prairie Legacy Series #2); ...
5,ATTLUK81FJVJ3,Tom Bruce,"Look Homeward, Angel; The Wapshot Chronicle; M..."


In [243]:
readers.to_parquet(r'..\\..\\Desafio_A3Data\\Databases\\top_avaliadores_maio.parquet', engine='pyarrow')

In [244]:
#Leitores mais rápidos na leitura
trashold = 250
selected = sortByReadingTime[sortByReadingTime['time'] <= trashold][:10]
readers = get_readers(selected)
readers

Unnamed: 0,User_id,profileName,Title
0,A13O378B1DOMAT,vida@sat.net,LifeSupport
1,A1PS66N855AO4D,cdunigan@hotmail.com,Joni
2,A1XORYSYC1UFVS,JackRyan82@aol.com,Don't Pee on My Leg and Tell Me It's Raining: ...
3,A1Y8AZD329JVMX,ttaylor@truman.edu,"Look Homeward, Angel"
4,A207HBXLAPVVOV,grace1965@webtv.net,Ramona the pest
5,A2GAC1FNBTWELF,a_patton@rocketmail.com,LifeSupport
6,A2LYQ67O4B01QD,Kathleen T. Choi,LifeSupport
7,A2S7H7Y07NOPE2,kk@interport.net,A Death In The Family
8,A35BFOXTB2D1E6,cuiry@bigfoot.com,Don't Pee on My Leg and Tell Me It's Raining: ...
9,A35T17R6UPKHQ8,jonath38@wharton.upenn.edu,Don't Pee on My Leg and Tell Me It's Raining: ...


In [245]:
readers.to_parquet(r'..\\..\\Desafio_A3Data\\Databases\\top_velozes_maio.parquet', engine='pyarrow')

In [246]:
#Leitores positivos com mais propriedade 
score_trashold = 5
count_trashold = 5
selected = sortByScore[(sortByScore['score'] >= score_trashold) & (sortByScore['total_count'] >= count_trashold)][:10]
readers = get_readers(selected)
readers

Unnamed: 0,User_id,profileName,Title
0,AA0BB36HOBYLC,"Deborah Dieleman ""Jen D""",A Searching Heart (Prairie Legacy Series #2); ...
1,ATTLUK81FJVJ3,Tom Bruce,"Look Homeward, Angel; The Wapshot Chronicle; M..."


In [247]:
readers.to_parquet(r'..\\..\\Desafio_A3Data\\Databases\\top_entendedores_maio.parquet', engine='pyarrow')

In [248]:
#Esse print traz a variável summaryDescription, que tem o objetivo de mostrar uma espécie de atrativo 
#para o leitor, a ideia aqui é usar o texto gerado para servir de input numa página web, por exemplo
#mostrando que determinado livro aborda, em suma a informação "x"
booktitle = 'Look Homeward, Angel'
print(bd[bd['Title'] == booktitle]['summaryDescription'].iloc[0])

It is Wolfe's first novel, and is considered a highly autobiographical American coming-of-age story . The character of Eugene Gant is generally believed to be a depiction of Wolfe himself .
