Работа с файлом данных https://raw.githubusercontent.com/jvns/pandas-cookbook/master/data/popularity-contest
Разбираем на примере как преобразовать Unix timestamps в обычный формат даты и времени.

In [1]:
%matplotlib inline

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

plt.style.use('ggplot')  # Красивые графики
plt.rcParams['figure.figsize'] = (15, 5)  # Размер картинок

In [2]:
# Read it, and remove the last row
popcon = pd.read_csv(r'https://raw.githubusercontent.com/jvns/pandas-cookbook/master/data/popularity-contest', sep=' ')[:-1]
popcon.columns = ['atime', 'ctime', 'package-name', 'mru-program', 'tag']

In [3]:
popcon

Unnamed: 0,atime,ctime,package-name,mru-program,tag
0,1387295797,1367633260,perl-base,/usr/bin/perl,
1,1387295796,1354370480,login,/bin/su,
2,1387295743,1354341275,libtalloc2,/usr/lib/x86_64-linux-gnu/libtalloc.so.2.0.7,
3,1387295743,1387224204,libwbclient0,/usr/lib/x86_64-linux-gnu/libwbclient.so.0,<RECENT-CTIME>
4,1387295742,1354341253,libselinux1,/lib/x86_64-linux-gnu/libselinux.so.1,
...,...,...,...,...,...
2892,0,0,libreadline-dev,<NOFILES>,
2893,0,0,notify-osd-icons,<NOFILES>,
2894,0,0,python-apt-common,<NOFILES>,
2895,0,0,libindicator-messages-status-provider1,<NOFILES>,


преобразуем время в целочисленные значения

In [4]:
popcon['atime'] = popcon['atime'].astype(int)
popcon['ctime'] = popcon['ctime'].astype(int)

Каждый массив numpy и pandas series имеют тип (dtype) - обычно это int64, float64, или object. Некоторые типы времени доступны как datetime64[s], datetime64[ms], и datetime64[us]. Также есть тип timedelta.

Мы можем использовать функцию pd.to_datetime для преобразования чисел в datetime

In [5]:
popcon['atime'] = pd.to_datetime(popcon['atime'], unit='s')
popcon['ctime'] = pd.to_datetime(popcon['ctime'], unit='s')

Посмотрим на dtype -  M8 это код для datetime64

In [6]:
popcon['atime'].dtype

dtype('<M8[ns]')

посмотрим как выглядит время после преобразования

In [7]:
popcon

Unnamed: 0,atime,ctime,package-name,mru-program,tag
0,2013-12-17 15:56:37,2013-05-04 02:07:40,perl-base,/usr/bin/perl,
1,2013-12-17 15:56:36,2012-12-01 14:01:20,login,/bin/su,
2,2013-12-17 15:55:43,2012-12-01 05:54:35,libtalloc2,/usr/lib/x86_64-linux-gnu/libtalloc.so.2.0.7,
3,2013-12-17 15:55:43,2013-12-16 20:03:24,libwbclient0,/usr/lib/x86_64-linux-gnu/libwbclient.so.0,<RECENT-CTIME>
4,2013-12-17 15:55:42,2012-12-01 05:54:13,libselinux1,/lib/x86_64-linux-gnu/libselinux.so.1,
...,...,...,...,...,...
2892,1970-01-01 00:00:00,1970-01-01 00:00:00,libreadline-dev,<NOFILES>,
2893,1970-01-01 00:00:00,1970-01-01 00:00:00,notify-osd-icons,<NOFILES>,
2894,1970-01-01 00:00:00,1970-01-01 00:00:00,python-apt-common,<NOFILES>,
2895,1970-01-01 00:00:00,1970-01-01 00:00:00,libindicator-messages-status-provider1,<NOFILES>,


Отсекаем timestamp которые до преобразования были 0-ми, а после преобразования приняли вид 1970-01-01

In [8]:
popcon = popcon[popcon['atime'] > '1970-01-01']

In [9]:
popcon

Unnamed: 0,atime,ctime,package-name,mru-program,tag
0,2013-12-17 15:56:37,2013-05-04 02:07:40,perl-base,/usr/bin/perl,
1,2013-12-17 15:56:36,2012-12-01 14:01:20,login,/bin/su,
2,2013-12-17 15:55:43,2012-12-01 05:54:35,libtalloc2,/usr/lib/x86_64-linux-gnu/libtalloc.so.2.0.7,
3,2013-12-17 15:55:43,2013-12-16 20:03:24,libwbclient0,/usr/lib/x86_64-linux-gnu/libwbclient.so.0,<RECENT-CTIME>
4,2013-12-17 15:55:42,2012-12-01 05:54:13,libselinux1,/lib/x86_64-linux-gnu/libselinux.so.1,
...,...,...,...,...,...
2093,2010-10-15 16:41:50,2012-12-01 05:54:37,pptp-linux,/usr/sbin/pptp,<OLD>
2094,2010-06-08 10:06:29,2012-12-01 05:54:57,libfile-basedir-perl,/usr/share/perl5/File/BaseDir.pm,<OLD>
2095,2010-03-06 14:44:18,2012-12-01 05:54:37,laptop-detect,/usr/sbin/laptop-detect,<OLD>
2096,2010-02-22 14:59:21,2012-12-01 05:54:14,libfribidi0,/usr/bin/fribidi,<OLD>


посмотрим на все пакеты, которые не (~) являются библиотеками, т.е не содержат lib и выведем фрейм отсортировав
от позднего времени к раннему

In [11]:
nonlibraries = popcon[~popcon['package-name'].str.contains('lib')]

In [12]:
nonlibraries.sort_values('ctime', ascending=False)[:10]

Unnamed: 0,atime,ctime,package-name,mru-program,tag
57,2013-12-17 04:55:39,2013-12-17 04:55:42,ddd,/usr/bin/ddd,<RECENT-CTIME>
450,2013-12-16 20:03:20,2013-12-16 20:05:13,nodejs,/usr/bin/npm,<RECENT-CTIME>
454,2013-12-16 20:03:20,2013-12-16 20:05:04,switchboard-plug-keyboard,/usr/lib/plugs/pantheon/keyboard/options.txt,<RECENT-CTIME>
445,2013-12-16 20:03:20,2013-12-16 20:05:04,thunderbird-locale-en,/usr/lib/thunderbird-addons/extensions/langpac...,<RECENT-CTIME>
396,2013-12-16 20:08:27,2013-12-16 20:05:03,software-center,/usr/sbin/update-software-center,<RECENT-CTIME>
449,2013-12-16 20:03:20,2013-12-16 20:05:00,samba-common-bin,/usr/bin/net.samba3,<RECENT-CTIME>
397,2013-12-16 20:08:25,2013-12-16 20:04:59,postgresql-client-9.1,/usr/lib/postgresql/9.1/bin/psql,<RECENT-CTIME>
398,2013-12-16 20:08:23,2013-12-16 20:04:58,postgresql-9.1,/usr/lib/postgresql/9.1/bin/postmaster,<RECENT-CTIME>
452,2013-12-16 20:03:20,2013-12-16 20:04:55,php5-dev,/usr/include/php5/main/snprintf.h,<RECENT-CTIME>
440,2013-12-16 20:03:20,2013-12-16 20:04:54,php-pear,/usr/share/php/XML/Util.php,<RECENT-CTIME>


цена на акции корпорации Apple за 5 лет по дням

In [4]:
import pandas as pd
df = pd.read_csv(r'c:\Users\1cons\Python-projects\other\apple.csv', index_col='Date', parse_dates=True)
df = df.sort_index()
print(df)
print(df.info())

                  Open        High         Low       Close     Volume  \
Date                                                                    
2012-02-23  515.079987  517.830009  509.499992  516.389977  142006900   
2012-02-24  519.669998  522.899979  518.640015  522.409981  103768000   
2012-02-27  521.309982  528.500000  516.280014  525.760017  136895500   
2012-02-28  527.960014  535.410011  525.850006  535.410011  150096800   
2012-02-29  541.560005  547.610023  535.700005  542.440025  238002800   
...                ...         ...         ...         ...        ...   
2017-02-15  135.520004  136.270004  134.619995  135.509995   35501600   
2017-02-16  135.669998  135.899994  134.839996  135.350006   22118000   
2017-02-17  135.100006  135.830002  135.100006  135.720001   22084500   
2017-02-21  136.229996  136.750000  135.979996  136.699997   24265100   
2017-02-22  136.429993  137.119995  136.110001  137.110001   20745300   

             Adj Close  
Date                    


Средняя цена по закрытию в мае

In [7]:
df.loc['2012-May', 'Close'].mean()

564.6731789999999

In [8]:
df.loc['2012-Feb':'2015-Feb', 'Close'].mean()

430.43968317018414

Среднее по неделям ('W') используем resample, для указания по какому параметру выборка

In [9]:
df.resample('W')['Close'].mean()[:10]

Date
2012-02-26    519.399979
2012-03-04    538.652008
2012-03-11    536.254004
2012-03-18    576.161993
2012-03-25    600.990001
2012-04-01    609.698003
2012-04-08    626.484993
2012-04-15    623.773999
2012-04-22    591.718002
2012-04-29    590.536005
Freq: W-SUN, Name: Close, dtype: float64

In [14]:
label = pd.read_csv('https://github.com/jacoxu/StackOverflow/blob/master/rawText/label_StackOverflow.txt')

ParserError: Error tokenizing data. C error: Expected 1 fields in line 49, saw 2
