## bicycles

- [Tags] 525
- [PostLinks] 6,140
- [Badges] 80,935
- [Users] 40,571
- [Votes] 283,664
- [Comments] 131,281
- [Posts] 56,860
- [PostHistory] 146,878

## coffee

- [Tags] 115
- [PostLinks] 602
- [Comments] 4,365
- [Badges] 10,852
- [Votes] 20,663
- [Posts] 3,936
- [Users] 8,256
- [PostHistory] 10,178

## ukrainian

- [Tags] 120
- [PostLinks] 399
- [Badges] 6,248
- [Users] 3,080
- [Comments] 6,954
- [Votes] 28,867
- [Posts] 5,069
- [PostHistory] 16,102

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%run common.ipynb


## Załadowanie danych

In [2]:
bicycles_posts_df = read_stackexchange(ModelType.POSTS, ForumType.BICYCLES)
coffee_posts_df = read_stackexchange(ModelType.POSTS, ForumType.COFFEE)
ukrainian_posts_df = read_stackexchange(ModelType.POSTS, ForumType.UKRAINIAN)

### 

In [60]:
def analyze_v1(posts):
    # PostTypeId: 1 -> Question, 2 -> Answer
    questions = posts.loc[posts.PostTypeId == 1, ['Id', 'CreationDate', 'Title', 'OwnerUserId']]
    answers = posts.loc[posts.PostTypeId == 2, ['Id', 'ParentId', 'CreationDate', 'Body', 'OwnerUserId']]
    
    questions['CreationDate_datetime'] = pd.to_datetime(questions['CreationDate'])
    answers['CreationDate_datetime'] = pd.to_datetime(answers['CreationDate'])
    
    df = questions.join(answers.set_index('ParentId'), on = 'Id', 
                        lsuffix='_question', rsuffix='_answer', how = 'inner')
    df['diff_time'] = df['CreationDate_datetime_answer'] - df['CreationDate_datetime_question']
    #df['diff_hours']=df['diff_hours']/np.timedelta64(1,'h')
    
    
    # Aggregate diff_time by Id_question
    df = df.groupby('Id_question').agg({'diff_time': ['min', 'max']})
    df.columns = ['_'.join(col) for col in df.columns.values]
    
    # Get (min, max) from min and max
    return df.agg({'diff_time_min': ['min', 'max'], 'diff_time_max': ['min', 'max']})


In [62]:
analyze_v1(bicycles_posts_df)

Unnamed: 0,diff_time_min,diff_time_max
min,-292 days +20:16:49.880000,0 days 00:00:00
max,1619 days 12:16:06.473000,3806 days 00:28:05.467000


In [65]:
analyze_v1(coffee_posts_df)

Unnamed: 0,diff_time_min,diff_time_max
min,0 days 00:00:00,0 days 00:00:00
max,1679 days 21:41:25.017000,2178 days 04:13:18.530000


In [66]:
analyze_v1(ukrainian_posts_df)

Unnamed: 0,diff_time_min,diff_time_max
min,0 days 00:00:00,0 days 00:00:00
max,1195 days 04:04:19.527000,1429 days 06:35:39.454000


Otrzymujemy ciekawe wyniki - zerowe i ujemne.

Zacznijmy od ujemnych.

Po krótkim dochodzeniu, znajdujemy winowajcę:

In [83]:
def nostradamus():
    questions = bicycles_posts_df.loc[bicycles_posts_df.PostTypeId == 1, ['Id', 'CreationDate', 'Title']]
    answers = bicycles_posts_df.loc[bicycles_posts_df.PostTypeId == 2, ['Id', 'ParentId', 'CreationDate']]
    
    questions['CreationDate_datetime'] = pd.to_datetime(questions['CreationDate'])
    answers['CreationDate_datetime'] = pd.to_datetime(answers['CreationDate'])
    
    df = questions.join(answers.set_index('ParentId'), on = 'Id', 
                        lsuffix='_question', rsuffix='_answer', how = 'inner')
    df['diff_time'] = df['CreationDate_datetime_answer'] - df['CreationDate_datetime_question']
    #df['diff_hours']=df['diff_hours']/np.timedelta64(1,'h')
    
    
    return df.loc[df['Id_question'] == 10069, ['Title', 'CreationDate_question', 'CreationDate_answer', 'diff_time']]

In [84]:
nostradamus()

Unnamed: 0,Title,CreationDate_question,CreationDate_answer,diff_time
9293,Does drafting cause resistance to the lead rider?,2012-06-26T15:56:24.383,2011-09-09T12:13:14.263,-292 days +20:16:49.880000
9293,Does drafting cause resistance to the lead rider?,2012-06-26T15:56:24.383,2011-09-09T13:09:40.740,-292 days +21:13:16.357000
9293,Does drafting cause resistance to the lead rider?,2012-06-26T15:56:24.383,2012-06-26T16:08:55.867,0 days 00:12:31.484000
9293,Does drafting cause resistance to the lead rider?,2012-06-26T15:56:24.383,2012-06-26T22:40:22.667,0 days 06:43:58.284000
9293,Does drafting cause resistance to the lead rider?,2012-06-26T15:56:24.383,2012-06-27T11:20:53.213,0 days 19:24:28.830000
9293,Does drafting cause resistance to the lead rider?,2012-06-26T15:56:24.383,2015-08-25T04:43:32.313,1154 days 12:47:07.930000


Rzeczywiście użytkownik [coco](https://bicycles.stackexchange.com/users/4394/coco), znany dalej jako Nostradamus zadał (jedyne w swojej karierze na *bicycles.stackexchange.com*) pytanie w 2012 roku, a następnie (chociaż chciałoby się powiedzieć *poprzednio*) użytownicy [Daniel R Hicks](https://bicycles.stackexchange.com/users/1584/daniel-r-hicks) oraz [Angelo](https://bicycles.stackexchange.com/users/1998/angelo) odpowiedzieli w 2011 roku.

> asked Jun 26 '12 at 15:56
>
> coco

> answered Sep 9 '11 at 12:13
>
> Daniel R Hicks

> answered Sep 9 '11 at 13:09
>
> Angelo

Dowód podróży w czasie można znaleźć dalej wiszący na forum: https://bicycles.stackexchange.com/questions/10069/does-drafting-cause-resistance-to-the-lead-rider

In [75]:
def zeroday(posts):
    # PostTypeId: 1 -> Question, 2 -> Answer
    questions = posts.loc[posts.PostTypeId == 1, ['Id', 'CreationDate', 'Title', 'OwnerUserId']]
    answers = posts.loc[posts.PostTypeId == 2, ['Id', 'ParentId', 'CreationDate', 'Body', 'OwnerUserId']]
    
    questions['CreationDate_datetime'] = pd.to_datetime(questions['CreationDate'])
    answers['CreationDate_datetime'] = pd.to_datetime(answers['CreationDate'])
    
    df = questions.join(answers.set_index('ParentId'), on = 'Id', 
                        lsuffix='_question', rsuffix='_answer', how = 'inner')
    df['diff_time'] = df['CreationDate_datetime_answer'] - df['CreationDate_datetime_question']
    #df['diff_hours']=df['diff_hours']/np.timedelta64(1,'h')
    
    
    # Aggregate diff_time by Id_question
    
    
    df['IsSame'] = df['OwnerUserId_question'] == df['OwnerUserId_answer']
    
    df = df.loc[
        df['CreationDate_datetime_question'] == df['CreationDate_datetime_answer'],
        ['Id_question', 'Title', 'IsSame']
    ]
    return df


In [76]:
zeroday(bicycles_posts_df)

Unnamed: 0,Id_question,Title,IsSame
11165,13157,I know there are better rain-forecast sources ...,True
11550,13573,What are some alternatives to fruitlessly ring...,True
12018,14102,Is the Suunto Ambit compatible with the Bontra...,True
12779,14965,What are my options for cycling mirrors?,True
13509,15827,Fixed Gear Chainring Centering,True
13606,15945,How can I run a dynamo wire from my front hub ...,True
13857,16215,How can I prevent cars from passing me too clo...,True
14217,16617,"What's an ""Auto-Mini"" folding bike? Can I use ...",True
14377,16794,Can I carry my tandem on a bumper carrier?,True
16081,18713,What are the effects of completely filling tir...,True


In [77]:
zeroday(coffee_posts_df)

Unnamed: 0,Id_question,Title,IsSame
318,339,Pouring technique for drip/pour over coffee,True
384,408,What is the difference between a long (luongo)...,True
394,418,What's the difference between a percolator and...,True
406,431,How can I make filtering my cold brew easier?,True
479,1512,What is the process for brewing egg coffee?,True
528,1564,What do the terms extraction and strength mean?,True
865,1938,How to (re-)calibrate an espresso machine and ...,True
1101,2195,How much sugar can I put it my coffee before i...,True
1382,2515,How to fix burr contact/alignment issues on an...,True
1708,2881,Is there a way to economically store freshly r...,True


In [78]:
zeroday(ukrainian_posts_df)

Unnamed: 0,Id_question,Title,IsSame
36,40,Який символ використовувати для позначення апо...,True
435,467,"Чи є загальне правило, яке керує написанням ""д...",True
473,505,Милозвучність української мови. Закони й засоби,True
600,639,"Як правильно вживати слова ""воєнний"", ""військо...",True
659,699,"Чи можна вживати слово ""калитка"" для позначенн...",True
843,900,"«Закордон» чи «за кордон», «закордоном» чи «за...",True
870,927,Чи справді вираз «ти правий» у значеннях «твоя...,True
1199,1270,Судно́ чи су́дно?,True
1996,3148,Лі́карський чи Ліка́рський? Як правильно?,True
2003,3155,"""Кажан"" чи ""летюча миша""? Як правильно?",True


In [79]:
def analyze_v2(posts):
    # PostTypeId: 1 -> Question, 2 -> Answer
    questions = posts.loc[posts.PostTypeId == 1, ['Id', 'CreationDate', 'Title', 'OwnerUserId']]
    answers = posts.loc[posts.PostTypeId == 2, ['Id', 'ParentId', 'CreationDate', 'Body', 'OwnerUserId']]
    
    questions['CreationDate_datetime'] = pd.to_datetime(questions['CreationDate'])
    answers['CreationDate_datetime'] = pd.to_datetime(answers['CreationDate'])
    
    df = questions.join(answers.set_index('ParentId'), on = 'Id', 
                        lsuffix='_question', rsuffix='_answer', how = 'inner')
    df['diff_time'] = df['CreationDate_datetime_answer'] - df['CreationDate_datetime_question']
    #df['diff_hours']=df['diff_hours']/np.timedelta64(1,'h')
    
    df = df.loc[df['CreationDate_datetime_answer'] > df['CreationDate_datetime_question']]
    
    # Aggregate diff_time by Id_question
    df = df.groupby('Id_question').agg({'diff_time': ['min', 'max']})
    df.columns = ['_'.join(col) for col in df.columns.values]
    
    # Get (min, max) from min and max
    return df.agg({'diff_time_min': ['min', 'max'], 'diff_time_max': ['min', 'max']})


In [81]:
analyze_v2(bicycles_posts_df)

Unnamed: 0,diff_time_min,diff_time_max
min,0 days 00:00:00.187000,0 days 00:01:30.724000
max,1619 days 12:16:06.473000,3806 days 00:28:05.467000


In [80]:
analyze_v2(coffee_posts_df)

Unnamed: 0,diff_time_min,diff_time_max
min,0 days 00:01:04.147000,0 days 00:01:04.147000
max,1679 days 21:41:25.017000,2178 days 04:13:18.530000


In [82]:
analyze_v2(ukrainian_posts_df)

Unnamed: 0,diff_time_min,diff_time_max
min,0 days 00:00:15.744000,0 days 00:00:15.744000
max,1195 days 04:04:19.527000,1429 days 06:35:39.454000
