# Basic Sorting 

Now that we know how to work with the dataframes we have read in from the feather
files, we want to build on our last problem. In this exercise, we want to use the
answers and questions df files we built in the last exercise as well as the the
comments feather file, and sort all three of them in descending sorted order using
the Score column. Once we have the files, we will write them out to 3 new files
called 'Questions-Sorted.feather', 'Answers-Sorted.feather', and 'Comments-Sorted.feather'. In addition we will write out only the Ids again to plain text so
that they can be compared against a gold standard when assessed. These files will
be 'Questions-Sorted.txt', 'Answers-Sorted.txt', and 'Comments-Sorted.txt'. Please
*note* the names should be case-sensitive, exactly as shown above.

### Read a feather file and return a dataframe. This is already done for you. You just have to call it from main to convert a feather file into a dataframe.

In [1]:
import pyarrow.feather as feather
import pandas as pd

In [2]:
def arrow_to_df(input_file_name):
    df = feather.read_feather(input_file_name)
    return df

### Write a feather file using a dataframe. This is done for you, you just need to call it.

In [3]:
def df_to_arrow(output_file_name, df):
    feather.write_feather(df, output_file_name, compression='zstd')
    return

### This function will write the Ids out to a file from a dataframe. You simply need to pass in a dataframe, and a file name to write, like 'ids.txt'.

In [4]:
def write_ids_to_file(df,out_file):
    with open(out_file,'w') as f:
        int_arr = df['Score'].to_list() #changed id to score
        str_arr = list(map(str,int_arr))
        f.write('\n'.join(str_arr))
        # If you wish to enable debug and see the output, uncomment
        # the the two lines below.
        #print_str = ' '.join(str_arr)
        #print ('{}:{}'.format(out_file, print_str))
    return
        

### Sort a dataframe based on a key in descending order and return the sorted dataframe. Hint: the key is probably 'Score', and you should pay attention to keeping the index values for the rows correct. There is a parameter called ignore_index that may be helpful.

In [5]:
def sort_df (df,key):
    new_df = df.sort_values(by=[key],ascending=False, ignore_index=True)
    
    return new_df

### Main Loop:
* First read the three feather files into a dataframe using the provided function.
* Now get the sort_df function working to generate the three new dataframes containing Questions, Answers,  and Comments in sorted order as defined above.
* Now write out the three new dataframes -- Answers-Sorted.feather, Questions-Sorted.feather, and Comments-Sorted.feather for later.
* Finally, you need to call write_ids_to file() for the three new dataframes. The output file name should be "Answers-Sorted.txt" and Questions-Sorted.txt" and "Comments-Sorted.txt". This will by our sanity check that you got the sort function correct. Make sure you use the output file names exactly as shown (case sensitive).

In [6]:

def main():
    
    #Reading feather files into a dataframe
    comments_df =arrow_to_df('Comments.feather')
    ans_df =arrow_to_df('Answers.feather')
    ques_df =arrow_to_df('Questions.feather')
    
    #Calling the sort function to sort comment, ans and ques df
    new_comment_df=sort_df(comments_df,'Score')
    new_asn_df=sort_df(ans_df,'Score')
    new_ques_df=sort_df(ques_df,'Score')
    
    #Writing df for later 
    df_to_arrow('Answers-Sorted.feather',new_asn_df)
    df_to_arrow('Questions-Sorted.feather',new_ques_df)
    df_to_arrow('Comments-Sorted.feather',new_comment_df)
    
    #Output final files
    write_ids_to_file(new_comment_df,'Comments-Sorted.txt')
    write_ids_to_file(new_asn_df,'Answers-Sorted.txt')
    write_ids_to_file(new_ques_df,'Questions-Sorted.txt')
    
    return

In [7]:

if __name__ == '__main__':
    main()
    print ('[INFO] Script completed with no errors.')

[INFO] Script completed with no errors.
