# Weekly Challenge 19

*Original URL* https://community.alteryx.com/t5/Weekly-Challenge/Challenge-19-Excel-Record-Locator/td-p/36748 and [**My Alteryx Approach**](https://github.com/dsmdavid/Alteryx-Weekly-Challenge/tree/master/submitted/sub_Challenge%2319)

## Brief
Use Case: Customer has 100's of xls files with 1 common sheet available in all workbooks. Through one process, the user would like to read across all of the xls files and return the values contained in specific cells - Row 2, Column 3 and Row 8, Column 2 for each sheet within each XLS workbook.


The result should be a table OR browse tool containing 3 columns: XLS File, Row2_ Column3, and Row8_Column2.


You will only have 2 xls files for this challenge, Book1 and Book2, but keep in mind that the use case is for 100s of Excel files with the same schema.  You won’t want to use 2 input tools since that would not scale to 100’s.  Also, for all data consumption, please check the box for First Row Contains Data.  This is because in the headers for an Excel file are in row #1.


Good luck and keep it simple, this should be an easy challenge!

In [1]:
import pandas as pd
import os
from glob import glob


In [2]:
# INPUTS:
# What are the combinations needed -- converted to 0 indexed
RESULTS_NEEDED = [(1,2),(7,1)]

## Approach I want to follow:
1. Create a function to return any Row & Column combination from a df as a dictionary.  
1. List and process files to read.  
1. Count accounts opened/closed per month, calculate running total and ratio as instructed.

## 1. Create a function to read any row/column combination with headers:

In [3]:
# Create a function to read any row&column combination
def get_values(row, column, df):
    '''
    row, column = integers <= df.shape
    df = df
    {'Row_1_Column_2': 47}
    '''
    return { ''.join(['Row_',str(row+1),'_Column_',str(column+1)]) : df.iloc[row,column]} 
# the +1 is to accommodate to the original request

## 2. List all files matching the pattern and read them

In [4]:
# List files to read
# navigate to dir
try:
    os.chdir(os.path.join(os.getcwd(), '19_files'))
except:
    pass # probably already in the path
# get the files that match the pattern --here, all xlsx files
filenames = glob('*.xlsx')
filenames

['book1.xlsx', 'Book2.xlsx']

In [5]:
# Create an empty dictionary to store values
results = {}
# Iterate through the filenames
for i in filenames:
    read_xls = pd.read_excel(i, header=None)
    # Drop extension and Capitalize text
    i = os.path.splitext(i)[0].capitalize()
    results[i] = {}
    # Iterate through combinations of row_columns needed
    for comb in RESULTS_NEEDED:
        results[i].update(get_values(comb[0],comb[1], read_xls))
results 

{'Book1': {'Row_2_Column_3': 47, 'Row_8_Column_2': 30},
 'Book2': {'Row_2_Column_3': 94, 'Row_8_Column_2': 60}}

In [6]:
# convert to df
df_results = pd.DataFrame.from_dict(results, orient='index')
df_results

Unnamed: 0,Row_8_Column_2,Row_2_Column_3
Book1,30,47
Book2,60,94
