# Pandas tutorial : Day 4
Here's what we are going to do today : 
* [Sorting data](#1)
* [Renaming columns](#2)
* [Defining a new column](#3)
* [Changing index name](#4)
* [Making all columns lowercase](#5)
* [Making all columns uppercase](#6)

Let's get started!

[Data for daily news for stop market prediction](https://www.kaggle.com/aaron7sun/stocknews)

In [1]:
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

/kaggle/input/stocknews/Combined_News_DJIA.csv
/kaggle/input/stocknews/RedditNews.csv
/kaggle/input/stocknews/upload_DJIA_table.csv


In [2]:
df = pd.read_csv('/kaggle/input/stocknews/upload_DJIA_table.csv')

## Sorting data<a id='1'></a>
**Method 1 (sort by single column) :** `df.sort_values(by = ['column_name'], ascending = False/True)`

The default of sorting is `ascending=True` i.e from low to high. But if you want sorting in decending order i.e from high to low make `ascending=False`

**Method 2 (sort by multiple column) :** `df.sort_values(by = ['column_name1, column_name2,...'], ascending = False/True)`

**Method 3 :** `df.sort_index()`

This will sort dataframe index from low to high.

In [6]:
df.sort_values(by = ['Date'], ascending = False)

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close
0,2016-07-01,17924.240234,18002.380859,17916.910156,17949.369141,82160000,17949.369141
1,2016-06-30,17712.759766,17930.609375,17711.800781,17929.990234,133030000,17929.990234
2,2016-06-29,17456.019531,17704.509766,17456.019531,17694.679688,106380000,17694.679688
3,2016-06-28,17190.509766,17409.720703,17190.509766,17409.720703,112190000,17409.720703
4,2016-06-27,17355.210938,17355.210938,17063.080078,17140.240234,138740000,17140.240234
...,...,...,...,...,...,...,...
1984,2008-08-14,11532.070312,11718.280273,11450.889648,11615.929688,159790000,11615.929688
1985,2008-08-13,11632.809570,11633.780273,11453.339844,11532.959961,182550000,11532.959961
1986,2008-08-12,11781.700195,11782.349609,11601.519531,11642.469727,173590000,11642.469727
1987,2008-08-11,11729.669922,11867.110352,11675.530273,11782.349609,183190000,11782.349609


In [7]:
df.sort_values(by = ['Open', 'Close'], ascending = False)

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close
282,2015-05-20,18315.060547,18350.130859,18272.560547,18285.400391,80190000,18285.400391
283,2015-05-19,18300.480469,18351.359375,18261.349609,18312.390625,87200000,18312.390625
280,2015-05-22,18286.869141,18286.869141,18217.140625,18232.019531,78890000,18232.019531
281,2015-05-21,18285.869141,18314.890625,18249.900391,18285.740234,84270000,18285.740234
337,2015-03-03,18281.949219,18281.949219,18136.880859,18203.369141,83830000,18203.369141
...,...,...,...,...,...,...,...
1847,2009-03-03,6764.810059,6855.290039,6705.629883,6726.020020,445280000,6726.020020
1846,2009-03-04,6726.500000,6979.220215,6726.419922,6875.839844,464830000,6875.839844
1843,2009-03-09,6625.740234,6709.609863,6516.859863,6547.049805,365990000,6547.049805
1844,2009-03-06,6595.160156,6755.169922,6469.950195,6626.939941,425170000,6626.939941


In [5]:
df.sort_index()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close
0,2016-07-01,17924.240234,18002.380859,17916.910156,17949.369141,82160000,17949.369141
1,2016-06-30,17712.759766,17930.609375,17711.800781,17929.990234,133030000,17929.990234
2,2016-06-29,17456.019531,17704.509766,17456.019531,17694.679688,106380000,17694.679688
3,2016-06-28,17190.509766,17409.720703,17190.509766,17409.720703,112190000,17409.720703
4,2016-06-27,17355.210938,17355.210938,17063.080078,17140.240234,138740000,17140.240234
...,...,...,...,...,...,...,...
1984,2008-08-14,11532.070312,11718.280273,11450.889648,11615.929688,159790000,11615.929688
1985,2008-08-13,11632.809570,11633.780273,11453.339844,11532.959961,182550000,11532.959961
1986,2008-08-12,11781.700195,11782.349609,11601.519531,11642.469727,173590000,11642.469727
1987,2008-08-11,11729.669922,11867.110352,11675.530273,11782.349609,183190000,11782.349609


## Renaming columns<a id='2'></a>
Syntax : `df.rename(columns = {'Old_name' : 'New_name', inplace = True})`

`inplace = True` will permanently overwrite the dataset. By default `inplace = False`

In [9]:
df.rename(columns= {'Date' : 'new_date'}).head()

Unnamed: 0,new_date,Open,High,Low,Close,Volume,Adj Close
0,2016-07-01,17924.240234,18002.380859,17916.910156,17949.369141,82160000,17949.369141
1,2016-06-30,17712.759766,17930.609375,17711.800781,17929.990234,133030000,17929.990234
2,2016-06-29,17456.019531,17704.509766,17456.019531,17694.679688,106380000,17694.679688
3,2016-06-28,17190.509766,17409.720703,17190.509766,17409.720703,112190000,17409.720703
4,2016-06-27,17355.210938,17355.210938,17063.080078,17140.240234,138740000,17140.240234


In [12]:
# we will make the data as it is by again renaming new_date to Date
df.rename(columns= {'new_date' : 'Date'}).head(1)

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close
0,2016-07-01,17924.240234,18002.380859,17916.910156,17949.369141,82160000,17949.369141


## Defining a new column<a id='3'></a>
If you want to make you own column you can do in this way

Syntax : `df['new_column_name'] = userdefine_operation`

In [13]:
df['Difference'] = df.High - df.Low
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Adj Close,Difference
0,2016-07-01,17924.240234,18002.380859,17916.910156,17949.369141,82160000,17949.369141,85.470703
1,2016-06-30,17712.759766,17930.609375,17711.800781,17929.990234,133030000,17929.990234,218.808594
2,2016-06-29,17456.019531,17704.509766,17456.019531,17694.679688,106380000,17694.679688,248.490235
3,2016-06-28,17190.509766,17409.720703,17190.509766,17409.720703,112190000,17409.720703,219.210937
4,2016-06-27,17355.210938,17355.210938,17063.080078,17140.240234,138740000,17140.240234,292.13086


In this way we can design our own custom columns.

## Changing index name<a id='4'></a>

In [14]:
# check for the current name of the index
print(df.index.name)

None


In [16]:
# giving name to the index 
df.index.name = 'index'
df.head()

Unnamed: 0_level_0,Date,Open,High,Low,Close,Volume,Adj Close,Difference
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,2016-07-01,17924.240234,18002.380859,17916.910156,17949.369141,82160000,17949.369141,85.470703
1,2016-06-30,17712.759766,17930.609375,17711.800781,17929.990234,133030000,17929.990234,218.808594
2,2016-06-29,17456.019531,17704.509766,17456.019531,17694.679688,106380000,17694.679688,248.490235
3,2016-06-28,17190.509766,17409.720703,17190.509766,17409.720703,112190000,17409.720703,219.210937
4,2016-06-27,17355.210938,17355.210938,17063.080078,17140.240234,138740000,17140.240234,292.13086


## Making all columns lowercase<a id='5'></a>

In [20]:
df.columns = map(str.lower, df.columns)
df.columns

Index(['date', 'open', 'high', 'low', 'close', 'volume', 'adj close',
       'difference'],
      dtype='object')

## Making all columns uppercase<a id='6'></a>

In [21]:
df.columns = map(str.upper, df.columns)
df.columns

Index(['DATE', 'OPEN', 'HIGH', 'LOW', 'CLOSE', 'VOLUME', 'ADJ CLOSE',
       'DIFFERENCE'],
      dtype='object')

That's all for today! we have learnt sorting of data, renaming of columns and defining new columns. In next tutorial we'll see how to drop data and convert data types.