![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/python_logo.png)

# Working with Dates

In [None]:
import pandas as pd
import numpy as np

Load in the Excel data that represents a year's worth of sales.

In [None]:
df = pd.read_excel("https://github.com/fjvarasc/DSPXI/blob/master/data/sample-salesv3.xlsx?raw=true")
#Using the below code we are making sure that 'date' column from csv is a datetime field in the dataframe
df['date'] = df['date'].astype('datetime64[ns]')

Using pandas, you can do complex filtering on dates. Before doing anything with dates, I encourage you to sort by the date column to make sure the results return what you are expecting.

In [None]:
df = df.sort_values(by=['date'])
df.head()


The python filtering syntax shown before works with dates.

In [None]:
df[df['date'] >='2014-09-05'].head()

One of the really nice features of pandas is that it understands dates so will allow us to do partial filtering. If we want to only look for data more recent than a specific month, we can do so.

In [None]:
df[df['date'] >='2014-03'].head()

Of course, you can chain the criteria.

In [None]:
df[(df['date'] >='20140702') & (df['date'] <= '2014-07-15')].head()

Because pandas understands date columns, you can express the date value in multiple formats and it will give you the results you expect.

In [None]:
df[df['date'] >= 'Oct-2014'].head()

In [None]:
df[df['date'] >= '102014'].head()

When working with time series data, if we convert the data to use the date as at the index, we can do some more filtering.

Set the new index using `set_index`.

In [None]:
df2 = df.set_index(['date'])
df2.head()

We can slice the data to get a range.

In [None]:
df2["2014-01-01":"2014-02-01"].head()

Once again, we can use various date representations to remove any ambiguity around date naming conventions.

In [None]:
df2["2014-Jan-1":"2014-Feb-1"].head()

In [None]:
df2["2014-Jan-1":"2014-Feb-1"].tail()

In [None]:
df2["2014"].head()

In [None]:
df2["2014-Dec"].head()

# Additional String Functions

Pandas has support for vectorized string functions as well. If we want to identify all the skus that contain a certain value, we can use `str.contains`. In this case, we know that the sku is always represented in the same way, so B1 only shows up in the front of the sku.

In [None]:
df[df['sku'].str.contains('B1')].head()

We can string queries together and use sort to control how the data is ordered.

A common need in Excel is to understand all the unique items in a column. For instance, maybe we only want to know when customers purchased in this time period. The unique function makes this trivial.

In [None]:
df[(df['sku'].str.contains('B1-531')) & (df['quantity']>40)].sort_values(by=['quantity','name'],ascending=[0,1])

# Bonus Task

A very frequent scenario is trying to get a list of unique items in a long list within Excel. It is a multi-step process to do this in Excel but is fairly simple in pandas. We just use the [unique](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.unique.html) function on a column to get the list.

In [None]:
df["name"].unique()

If we wanted to include the account number, we could use `drop_duplicates`.

In [None]:
df.drop_duplicates(subset=["account number","name"]).head()

We are obviously pulling in more data than we need and getting some non-useful information, so select only the first and second columns using [`iloc`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html).

In [None]:
df.drop_duplicates(subset=["account number","name"]).iloc[:,[0,1]]

Now we encourage you to try and apply these ideas to some of your own repetitive Excel tasks and streamline your work flow.

# Guided Lab : Collecting the Data

Import pandas and numpy

In [None]:
import pandas as pd
import numpy as np

let's import each of our files and combine them into one file. [Panda's concat and append](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html) can do this for us. Then use append in this example.

The code snippet below will initialize a blank DataFrame then append all of the individual files into the all_data DataFrame.

In [None]:
all_data = pd.DataFrame()
df1 = pd.read_excel('https://github.com/fjvarasc/DSPXI/blob/master/data/sales-mar-2014.xlsx?raw=true')
df2 = pd.read_excel('https://github.com/fjvarasc/DSPXI/blob/master/data/sales-feb-2014.xlsx?raw=true')
df3 = pd.read_excel('https://github.com/fjvarasc/DSPXI/blob/master/data/sales-jan-2014.xlsx?raw=true')
all_data = all_data.append([df1,df2,df3],ignore_index=True)

In [None]:
all_data.dtypes

Now we have all the data in our all_data DataFrame. but there is one detail, as we saw before the date field  is not recognized as date, it is not critical but the best practice is to convert the date column to a date time object. we can do so by running the below code.

In [None]:
all_data['date'] = all_data['date'].astype('datetime64[ns]')
#all_data['date'] = pd.to_datetime(all_data['date']) this will be valid for the same too
all_data.dtypes

You can use describe to look at it and make sure you data looks good.

In [None]:
all_data.describe()

A lot of this data may not make much sense for this data set but for outr purpose we are interested in the count row to make sure the number of data elements makes sense.

In [None]:
all_data.head()

## Combining Data

Now that we have all of the data into one DataFrame, we can do any manipulations the DataFrame supports. In this case, the next thing we want to do is read in another file that contains the customer status by account. You can think of this as a company's customer segmentation strategy or some other mechanism for identifying their customers.

First, we read in the data.

In [None]:
status = pd.read_excel("https://github.com/fjvarasc/DSPXI/blob/master/data/customer-status.xlsx?raw=true")
status

We want to merge this data with our concatenated data set of sales. We use panda's merge function and tell it to do a left join which is similar to Excel's vlookup function.

In [None]:
all_data_st = pd.merge(all_data, status, how='left')
all_data_st.head()

This looks pretty good but let's look at a specific account.

In [None]:
all_data_st[all_data_st["account number"]==737550].head()

This account number was not in our status file, so we have a bunch of NaN's. We can decide how we want to handle this situation. For this specific case, let's label all missing accounts as bronze. Use the fillna function to easily accomplish this on the status column.

In [None]:
all_data_st['status'].fillna('bronze',inplace=True)
all_data_st.head()

Check the data just to make sure we're all good.

In [None]:
all_data_st[all_data_st["account number"]==737550].head()

Now we have all of the data along with the status column filled in. We can do our normal data manipulations using the full suite of pandas capability.

## Using Categories

One of the relatively new functions in pandas is support for categorical data. From the pandas, [documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html).

"Categoricals are a pandas data type, which correspond to categorical variables in statistics: a variable, which can take on only a limited, and usually fixed, number of possible values (categories; levels in R). Examples are gender, social class, blood types, country affiliations, observation time or ratings via Likert scales."

For our purposes, the status field is a good candidate for a category type.

First, we `typecast` it to a category using [astype](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.astype.html).

In [None]:
all_data_st["status"] = all_data_st["status"].astype("category")

This doesn't immediately appear to change anything yet.

In [None]:
all_data_st.head()

Buy you can see that it is a new data type.

In [None]:
all_data_st.dtypes

Categories get more interesting when you assign order to the categories. Right now, if we call sort on the column, it will sort alphabetically. 

In [None]:
all_data_st.sort_values(by=["status"]).head()

We use set_categories to tell it the order we want to use for this category object. In this case, we use the Olympic medal ordering.

In [None]:
 all_data_st["status"].cat.set_categories([ "gold","silver","bronze"],inplace=True)

Now, we can sort it so that gold shows on top.

In [None]:
all_data_st.sort_values(by=["status"]).head()

In [None]:
all_data_st["status"].describe()

For instance, if you want to take a quick look at how your top tier customers are performaing compared to the bottom. Use groupby to give us the average of the values.

In [None]:
all_data_st.groupby(["status"])["quantity","unit price","ext price"].mean()

Of course, you can run multiple aggregation functions on the data to get really useful information 

In [None]:
all_data_st.groupby(["status"])["quantity","unit price","ext price"].agg([np.sum,np.mean, np.std])

So, what does this tell you? Well, the data is completely random but the first observation is that we sell more units to our bronze customers than gold. Even when you look at the total dollar value associated with bronze vs. gold, it looks backwards.

Maybe we should look at how many bronze customers we have and see what is going on.

What I plan to do is filter out the unique accounts and see how many gold, silver and bronze customers there are.

We've purposely stringing a lot of commands together which is not necessarily best practice but does show how powerful pandas can be.

In [None]:
all_data_st.drop_duplicates(subset=["account number","name"]).iloc[:,[0,1,7]].groupby(["status"])["name"].count()

Ok. This makes a little more sense. We see that we have 9 bronze customers and only 4 gold customers. That is probably why the volumes are so skewed towards our bronze customers.

# Basic Excel with Python

## Excel App

In [None]:
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
default_path =  os.getcwd() + '\\' 
# we get current working directory for Excel, by default is the MS office folder in programs folder
# we will use it to save files to specific directory, default is C:\Program Files (x86)\Microsoft Office\OfficeXX 
print (excel.Path) #This is the default dir
print (default_path) # This is our current working dir
excel.Application.Quit() #Close the app

### Open Excel, Add a Workbook

The following script simply invokes Excel, adds a workbook and saves the empty workbook.

In [None]:
#
# Add a workbook and save (Excel 2007)
# For older versions of excel, use the .xls file extension
#
import win32com.client as win32
#excel = win32com.client.Dispatch("Excel.Application")
excel = win32.gencache.EnsureDispatch('Excel.Application')
#by use DisplayAlerts we default all excel dialogs, example : it will overwrite a file in case the file exists
excel.DisplayAlerts = False 
excel.Application.ChDir = default_path
wb = excel.Workbooks.Add()
wb.SaveAs(default_path + 'add_a_workbook.xlsx')
excel.Application.Quit()

### Open an Existing Workbook

This script opens an existing workbook and displays it (note the statement excel.Visible =True). The file workbook1.xlsx must already exist in your “My Documents” directory. You can also open spreadsheet files by specifying the full path to the file as shown below. Using r'in the statement r'C:\myfiles\excel\workbook2.xlsx' automatically escapes the backslash characters and makes the file name a bit more concise.

In [None]:
#
# Open an existing workbook
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(default_path + 'customer-status.xlsx')
excel.Visible = True

### Add a Worksheet

This script creates a new workbook with three sheets, adds a fourth worksheet and names it MyNewSheet.

In [None]:
#
# Add a workbook, add a worksheet,
# name it 'MyNewSheet' and save
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Add()
ws = wb.Worksheets.Add()
ws.Name = "MyNewSheet"
wb.SaveAs(default_path + 'add_a_worksheet.xlsx')
excel.Application.Quit()

### Ranges and Offsets

This script illustrates different techniques for addressing cells by using the Cells() and Range()operators. Individual cells can be addressed using Cells(row,column), where row is the row number, column is the column number, both start from 1. Groups of cells can be addressed using Range(), where the argument in the parenthesis can be a single cell denoted by its textual name (eg "A2"), a group noted by a textual name with a colon (eg "A3:B4") or a group denoted with two Cells() identifiers (eg ws.Cells(1,1),ws.Cells(2,2)). The Offsetmethod provides a way to address a cell based on a reference to another cell.

In [None]:
#
# Using ranges and offsets
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Add()
ws = wb.Worksheets("Sheet1")
ws.Cells(1,1).Value = "Cell A1"
ws.Cells(1,1).Offset(2,4).Value = "Cell D2"
ws.Range("A2").Value = "Cell A2"
ws.Range("A3:B4").Value = "A3:B4"
ws.Range("A6:B7,A9:B10").Value = "A6:B7,A9:B10"
wb.SaveAs(default_path + 'ranges_and_offsets.xlsx')
excel.Application.Quit()

### Autofill Cell Contents

This script uses Excel’s autofill capability to examine data in cells A1 and A2, then autofill the remaining column of cells through A10.

In [None]:
#
# Autofill cell contents
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Add()
ws = wb.Worksheets("Sheet1")
ws.Range("A1").Value = 1
ws.Range("A2").Value = 2
ws.Range("A1:A2").AutoFill(ws.Range("A1:A10"),win32.constants.xlFillDefault)
wb.SaveAs(default_path + 'autofill_cells.xlsx')
excel.Application.Quit()

### Cell Color

This script illustrates adding an interior color to the cell using Interior.ColorIndex. Column A, rows 1 through 20 are filled with a number and assigned that ColorIndex.

In [None]:
#
# Add an interior color to cells
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Add()
ws = wb.Worksheets("Sheet1")
for i in range (1,21):
    ws.Cells(i,1).Value = i
    ws.Cells(i,1).Interior.ColorIndex = i
wb.SaveAs(default_path + 'cell_color.xlsx')
excel.Application.Quit()

### Column Formatting

This script creates two columns of data, one narrow and one wide, then formats the column width with the ColumnWidth property. You can also use the Columns.AutoFit() function to autofit all columns in the spreadsheet.

In [None]:
#
# Set column widths
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Add()
ws = wb.Worksheets("Sheet1")
ws.Range("A1:A10").Value = "A"
ws.Range("B1:B10").Value = "This is a very long line of text"
ws.Columns(1).ColumnWidth = 1
ws.Range("B:B").ColumnWidth = 27
# Alternately, you can autofit all columns in the worksheet
# ws.Columns.AutoFit()
wb.SaveAs(default_path + 'column_widths.xlsx')
excel.Application.Quit()

### Format Worksheet Cells

This script creates two columns of data, then formats the font type and font size used in the worksheet. Five different fonts and sizes are used, the numbers are formatted using a monetary format.

In [None]:
#
# Format cell font name and size, format numbers in monetary format
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Add()
ws = wb.Worksheets("Sheet1")

for i,font in enumerate(["Arial","Courier New","Garamond","Georgia","Verdana"]):
    ws.Range(ws.Cells(i+1,1),ws.Cells(i+1,2)).Value = [font,i+i]
    ws.Range(ws.Cells(i+1,1),ws.Cells(i+1,2)).Font.Name = font
    ws.Range(ws.Cells(i+1,1),ws.Cells(i+1,2)).Font.Size = 12+i

ws.Range("A1:A5").HorizontalAlignment = win32.constants.xlRight
ws.Range("B1:B5").NumberFormat = "$###,##0.00"
ws.Columns.AutoFit()
wb.SaveAs(default_path + 'format_cells.xlsx')
excel.Application.Quit()

### Setting Row Height

This script illustrates row height. Similar to column height, row height can be set with the RowHeight method. You can also useAutoFit() to automatically adjust the row height based on cell contents.

In [None]:
#
# Set row heights and align text within the cell
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Add()
ws = wb.Worksheets("Sheet1")
ws.Range("A1:A2").Value = "1 line"
ws.Range("B1:B2").Value = "Two\nlines"
ws.Range("C1:C2").Value = "Three\nlines\nhere"
ws.Range("D1:D2").Value = "This\nis\nfour\nlines"
ws.Rows(1).RowHeight = 60
ws.Range("2:2").RowHeight = 120
ws.Rows(1).VerticalAlignment = win32.constants.xlCenter
ws.Range("2:2").VerticalAlignment = win32.constants.xlCenter

# Alternately, you can autofit all rows in the worksheet
# ws.Rows.AutoFit()

wb.SaveAs(default_path + 'row_height.xlsx')
excel.Application.Quit()

### Copying Data from Worksheet to Worksheet

This script uses the FillAcrossSheets() method to copy data from one location to all other worksheets in the workbook. Specifically, the data in the range A1:J10 is copied from Sheet1 to sheets Sheet2 and Sheet3.

In [None]:
#
# Copy data and formatting from a range of one worksheet
# to all other worksheets in a workbook
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Add()
ws = wb.Worksheets("Sheet1")
ws.Range("A1:J10").Formula = "=row()*column()"
wb.Worksheets.FillAcrossSheets(wb.Worksheets("Sheet1").Range("A1:J10"))
wb.SaveAs(default + 'copy_worksheet_to_worksheet.xlsx')
excel.Application.Quit()

### Conditional Formatting

This script builds two data tables from scratch, applies conditional formatting to the tables and saves the result to ConditionalFormatting.xlsx. This script only works with Excel 2007 and later versions.

In [None]:
#http://pythonexcels.com/mapping-excel-vb-macros-to-python-revisited/
#
# conditionalformatting.py
# Create two tables and apply Conditional Formatting
#
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
#excel.Visible = True
wb = excel.Workbooks.Add()
ws = wb.Worksheets('Sheet1')
ws.Range("B2:K2").Value = [i for i in range(1,11)]
ws.Range("B2:B11").Value = [i for i in range(1,11)]
ws.Range("C3").Formula = "=$B3*C$2"
ws.Range("C3:C3").Select()
excel.Selection.AutoFill(ws.Range("C3:K3"),win32.constants.xlFillDefault)
ws.Range("C3:K3").Select()
excel.Selection.AutoFill(ws.Range("C3:K11"),win32.constants.xlFillDefault)
ws.Range("B13:K22").Formula = "=INT(RAND()*100)"
ws.Range("B2:K22").Select()
excel.Selection.FormatConditions.AddColorScale(ColorScaleType = 3)
excel.Selection.FormatConditions(excel.Selection.FormatConditions.Count).SetFirstPriority()
[csc1,csc2,csc3] = [excel.Selection.FormatConditions(1).ColorScaleCriteria(n) for n in range(1,4)]
csc1.Type = win32.constants.xlConditionValueLowestValue
csc1.FormatColor.Color = 13011546
csc1.FormatColor.TintAndShade = 0
csc2.Type = win32.constants.xlConditionValuePercentile
csc2.Value = 50
csc2.FormatColor.Color = 8711167
csc2.FormatColor.TintAndShade = 0
csc3.Type = win32.constants.xlConditionValueHighestValue
csc3.FormatColor.Color = 7039480
csc3.FormatColor.TintAndShade = 0
ws.Range("A1").Select()
wb.SaveAs(default_path + 'ConditionalFormatting.xlsx')
excel.Application.Quit()

## Mapping Excel VB Macros to Python
A handy feature in Excel is the ability to quickly record a Visual Basic (VB) macro and save it. It’s also fairly simple to take a captured VB macro, tweak it slightly and use it in your Python scripts. This capability has been used over the years to capture a sequence of operations that modify a spreadsheet and build a pivot table or chart, then integrate the macro into a Python script.
We will check how to capture a simple set of operations in a macro, examine the Visual Basic macro, port it to Python and run it. 

### Record Macro
The first step is to capture the macro in Excel using Record Macro. In Excel 2007 and later the Developer tab that contains the Record Macro button is turned off by default, you will need to enable it by selecting “Excel Options” from the ribbon menu, then select “Popular” in the left hand column and select the “Show Developer tab in the Ribbon” checkbox as shown here.

![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091012_exceloptions.png)

Starting with a simple spreadsheet containing a table of data, click on the “Developer” tab, then “Record Macro”.

![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091012_recordmacro.png)

If you’re using an older version of Excel, select Tools->Macro->Record New Macro from the menu as shown here.

![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091012_recordmacroexcel2003.png)

The goal is to expand the existing table to a 15×15 table, adjust the column width to make the table appear more square and save the new spreadsheet. Now that the macro is recording, the first step is to select the last row of data and expanding it by dragging it down an additional 5 rows. First, select the data:

![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091012_selectrow.png)

then dragged to create 5 new rows of data.

![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091012_dragrow.png)

Do the same select and drag operation for the last column of data to create 5 new columns.

![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091012_selectcolumn.png)
![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091012_dragcolumn.png)

Now you have a 15×15 multiplication table. To resize the columns, select the headers for columns B through P, click the right mouse and select “Column Width”.

![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091012_columnwidth.png)

Enter “4” as the new column width and click OK. The spreadsheet will now look like this:

![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091012_resizecolumns.png)

Now stop capturing the macro by clicking on Stop Recording
![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091012_stoprecording.png)


If you’re using an older version of Excel, select Tools->Macro->Stop Recording from the menu bar.

To view the macro, click on the View Macros button

![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091012_viewmacros.png)

For older versions of Excel, select Tools->Macro->Macros

Select the macro you just recorded (this should be Macro1, but if you were experimenting you may have other macros, so select the highest numbered macro) and click Edit.

![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091012_editmacro.png)

This will open your macro in the Microsoft Visual Basic GUI, and it should look something like this

```vbscript
Sub Macro1()
'
' Macro1 Macro
'
    Range("B11:K11").Select
    Selection.AutoFill Destination:=Range("B11:K16"), Type:=xlFillDefault
    Range("B11:K16").Select
    Range("K2:K16").Select
    Selection.AutoFill Destination:=Range("K2:P16"), Type:=xlFillDefault
    Range("K2:P16").Select
    Columns("B:P").Select
    Selection.ColumnWidth = 4
End Sub
```

Don’t worry if there are some extra or redundant lines in your macro, they can be removed as the script is ported. Now we’re ready to fire up Python and integrate this macro into a script.

### Porting

To get started, open the spreadsheet with the 10×10 multiplication table by entering the following four commands (make sure the file “MultiplicationTable.xlsx” is in the folder of this notebook.

In [None]:
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open('MultiplicationTable.xlsx')
excel.Visible = True
excel.DisplayAlerts = False 

The notebook code should now open a new Excel app with the file "MultiplicationTable.xlsx"

the code below are boilerplate commands you’ll be using in each exercise to invoke and interface to Excel. 

* [x] The first two commands, import win32com.client as win32, and excel =win32.gencache.EnsureDispatch( 'Excel.Application'), import the win32 module and open the Excel process. 

* [x] The command wb = excel.Workbooks.Open('MultiplicationTable.xlsx') opens the worksheet. In general, you’ll need a excel.Workbooks.Open() or excel.Workbooks.Add() command to open an existing workbook or create a new workbook. 

* [x] The command excel.Visible = True makes Excel app visible on the screen, rather than running as a hidden process in the background.

* [x] The command excel.DisplayAlerts = False will make all messages to default response to avoid macro execution.


Looking at the "Macro1" code, the first command is Range("B11:K11").Select. The Rangevariable name is within the context of the Worksheet, so you need to create a container for operations on the worksheet. 

* The command ws = wb.Worksheets('Sheet1') will do the trick.

In [None]:
ws = wb.Worksheets('Sheet1')

Once the variable pointing to the worksheet is defined, append the macro command to ws. and try it. Note that Select is a function and requires the open and close parenthesis pair in order to operate correctly. This pattern may be used for every Range().Select line in the macro.

In [None]:
ws.Range("B11:K11").Select()

If you bring the worksheet to the foreground, you’ll see that the range B11:K11 has been selected. 
The next task is to autofill the 5 rows below using the `Selection.AutoFillDestination:=Range("B11:K16"),  Type:=xlFillDefault` construct. `Selection` is a method at the Excel Application level, you need to precede it with excel. 

in this example. The arguments `Destination:=Range("B11:K16"),  Type:=xlFillDefault` must be provided to the function, either using the keyword arguments Destination and Type, or by using positional notation. To make your programs as robust as possible, you should include the keywords, but it’s not strictly required and I don’t use that pattern in this example.

The definition for the constant `xlFillDefault` is contained in `win32.constants`, you can access this value by specifying `win32.constants.xlFillDefault` value. You can always use the Object Browser in the VB window to figure out the correct value (open the Object Browser by pressing F2, or by selecting View->Object Browser from the menu in the VB window.

![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091012_vbobjectbrowser.png)

Combining these translations, the full Python command is `excel.Selection.AutoFill(Destination=ws.Range("B11:K16"), Type=win32.constants.xlFillDefault )`, or `excel.Selection.AutoFill( ws.Range("B11:K16"), win32.constants.xlFillDefault)` , check below.

In [None]:
excel.Selection.AutoFill(ws.Range("B11:K16"),win32.constants.xlFillDefault)

The upcoming commands `Range("K2:K16").Select` and `Selection.AutoFill Destination:=Range("K2:P16"), Type:=xlFillDefault` are translated in the same fashion as the earlier Select and AutoFillcommands as shown below.

In [None]:
ws.Range("K2:K16").Select()
excel.Selection.AutoFill(ws.Range("K2:P16"),win32.constants.xlFillDefault)

The worksheet is now expanded to the full 15×15 table and looks like this:

![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091012_worksheetfilled.png)

The next section of the macro selects columns B through P and sets their width to 4. The statement Columns("B:P").  Select is a property of the worksheet, so prefix it with the ws.identifier and add the parenthesis to make it a Python function call. In the next statement,Selection is a property of excel, so prefix it as such. The translated statements are shown below.

In [None]:
ws.Columns("B:P").Select()
excel.Selection.ColumnWidth = 4

Last thing, WE save the file and close it.

In [None]:
wb.SaveAs('NewMultiplicationTable.xlsx')
excel.Application.Quit()

## Porting Reference Table


<table border="1">
<colgroup>
<col width="48%">
<col width="52%"></colgroup>
<thead valign="bottom">
<tr>
<th>VB</th>
<th>Python</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td></td>
<td>import win32com.client as win32</td>
</tr>
<tr>
<td></td>
<td>excel = win32.gencache.EnsureDispatch(‘Excel.Application’)</td>
</tr>
<tr>
<td></td>
<td>wb = excel.Workbooks.Open(‘MultiplicationTable.xlsx’)</td>
</tr>
<tr>
<td></td>
<td>wb = excel.Workbooks.Open(‘MultiplicationTable.xlsx’)</td>
</tr>
<tr>
<td></td>
<td>excel.Visible = True</td>
</tr>
<tr>
<td></td>
<td>ws = wb.Worksheets(‘Sheet1’)</td>
</tr>
<tr>
<td>Range(“B11:K11”).Select</td>
<td>ws.Range(“B11:K11”).Select()</td>
</tr>
<tr>
<td>Range(“B11:K11”).Select</td>
<td>ws.Range(“B11:K11”).Select()</td>
</tr>
<tr>
<td>Selection.AutoFill Destination:=Range(“B11:K16”), Type:=xlFillDefault</td>
<td>excel.Selection.AutoFill(ws.Range(“B11:K16”),win32.constants.xlFillDefault)</td>
</tr>
<tr>
<td>Range(“K2:K16”).Select</td>
<td>ws.Range(“K2:K16”).Select()</td>
</tr>
<tr>
<td>Selection.AutoFill Destination:=Range(“K2:P16”), Type:=xlFillDefault</td>
<td>excel.Selection.AutoFill(ws.Range(“K2:P16”),win32.constants.xlFillDefault)</td>
</tr>
<tr>
<td>Range(“K2:P16”).Select</td>
<td>ws.Columns(“B:P”).Select()</td>
</tr>
<tr>
<td>Columns(“B:P”).Select</td>
<td>ws.Columns(“B:P”).Select()</td>
</tr>
<tr>
<td>Selection.ColumnWidth = 4</td>
<td>excel.Selection.ColumnWidth = 4</td>
</tr>
<tr>
<td></td>
<td>excel.Application.Quit()</td>
</tr>
</tbody>
</table>

## Run a macro inside a Workbook

In [None]:
xl=win32com.client.Dispatch("Excel.Application")
xl.Visible = False

xl.Workbooks.Open(Filename=default_path + 'HelloMacro.xlsm')
messagetxt = "This file was created by a macro called from python :) Test1"
#the macro will create HelloPython.txt in "default_path"
xl.Application.Run("writefile",messagetxt)airpods
xl.Quit() 

# Excel Lab

In this example, a fictional company called ABCD Catering has recorded sales and order history for 2009 in their corporate ERP system. ABCD Catering provides catering services to leading Silicon Valley companies, providing the best in hamburgers, hot dogs, churros, sodas and other comfort. Your boss has asked you to examine this data and answer some questions and produce charts representing some of the data:

* What were the total sales in each of the last four quarters?
* What are the sales for each food item in each quarter?
* Who were the top 10 customers for ABCD catering in Q1?
* Who was the highest producing sales rep for the year?
* What food item had the highest unit sales in Q4?

Generating this information typically involves running five separate reports in the system. Since your boss is looking for this same information at the end of each quarter, you want to simplify your life and your bosses by automating the report. Using Python and Excel, you can download a spreadsheet copy of the raw data, process it, generate the key figures and charts and save them to a spreadsheet.

Take a look at the data in ABCDCatering.xls:

![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091102_original.png)

The spreadsheet contains some header information, then a large table of records for each order. Each record contains the fiscal year and quarter, food item, company name, order data, sales representative, booking and order quantity for each order. The data needs some work before you can use it in a pivot table. First, the data in rows 1 through 11 must be ignored, it’s meaningless for the pivot table. Also, some columns do not have a proper header and must be corrected before the data can be used. The good news is that after some minor massaging, this data will be ideally suited for processing with a pivot table in Excel. Close the spreadsheet and get ready to build the reports.

The program begins with the standard boilerplate: import the win32 module and start Excel.

## Start Excel

In [None]:
import win32com.client as win32
import sys
import os

excel = win32.gencache.EnsureDispatch('Excel.Application')
excel.Visible = True

Next, open the spreadsheet ABCDCatering.xls with some exception handling. The try/except clause attempts to open the file with the Workbooks.Open() method, and exits gracefully if the file is missing or some other problem occurred. Lastly, the variable ws is set to the spreadsheet containing the data.

## Open Workbook

In [None]:
default_path =  os.getcwd() + '\\' 

try:
    wb = excel.Workbooks.Open(default_path + 'ABCDCatering.xls')
except:
    print "Failed to open spreadsheet ABCDCatering.xls"
    sys.exit(1)
ws = wb.Sheets('Sheet1')

An easy way to load the entire spreadsheet into Python is the UsedRange method. The following command:

## Load Sheet Data in python

In [None]:
xldata = ws.UsedRange.Value

The above code grabs all the data in the Sheet1 worksheet and copies it into a tuple named xldata. Once inside Python, the data can be manipulated and placed back into the spreadsheet with minimal calls to the COM interface, resulting in faster, more efficient processing.

## Check Data

To delete rows, add columns and do other operations on the data, it must be converted to or copied to a list. The approach used here is to examine the data row by row, discarding the non essential header rows and copying everything else to a new list. The first step is to remove the rows that are not part of the column header row or record data. If you are using Python to generate the program interactively, Let's can investigate the data in the xldata tuple and display the data for the first record (xldata[0]) and header record (xldata[11]):

In [None]:
xldata[0]

In [None]:
xldata[11]

The length of both rows is 13, though xldata[0] contains many elements with a value of None. The following code checks the length of the data and skips any rows shorter then 13 fields or rows that contain None in the last field. Note that this code assumes that the actual data in the table always contains complete records, true in this dataset but you should always understand the characteristics of the data you’re working on.

## Fix Data

In [None]:
newdata = []
for row in xldata:
    if row[-1] is not None and len(row) == 13:
        newdata.append(row)

The `newdata` list now contains the header and data rows from the spreadsheet, but the header row is still not complete. All column headers must contain text in order to use this data in a pivot table. Unfortunately, the spreadsheet downloads produced by the ERP system have the column label over the numberical identifier for the item, while the text column header is blank. You can see that for the “Food” and “Company” data below.

![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091102_foodcompany.png)

One approach that works for this data is to scan the header and insert a column header based on the contents of the previous column. For example, the label for column F could be “Company Name”, created by simply appending the text ” Name” to the column header “Company” from the prior column. Using this simple algorithm, the column header row can be filled out and the spreadsheet made ready for pivot table conversion. A more complex lookup could be used as well, but the simple algorithm described here will scale if new fields are added to the report.

In [None]:
for i,field in enumerate(newdata[0]):
    if field is None:
        newdata[0][i] = lasthdr + " Name"
    else:
        lasthdr = newdata[0][i]

Now the data is ready for insertion back into the spreadsheet. To enable comparison between the new data set and the original, create a new sheet in the workbook, write the data to the new sheet and autofit the columns.

Now the data is ready for insertion back into the spreadsheet. To enable comparison between the new data set and the original, create a new sheet in the workbook, write the data to the new sheet and autofit the columns.

In [None]:
wsnew = wb.Sheets.Add()
wsnew.Range(wsnew.Cells(1,1),wsnew.Cells(len(newdata),len(newdata[0]))).Value = newdata
wsnew.Columns.AutoFit()

The last step is to save the worksheet to a new file and quit Excel.

In [None]:
wb.SaveAs('newABCDCatering.xlsx')
excel.Application.Quit()
#wb.SaveAs('newABCDCatering.xlsx',win32.constants.xlOpenXMLWorkbook)

## Check the output

After running the script, open the file newABCDCatering.xlsx or newABCDCatering.xls and view the contents. Note that the extraneous header information has been removed and blank column header information has been inserted programmatically as described earlier.

![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091102_exceloutput.png)

The new spreadsheet is ready for use in a pivot table :)

## Generate Reports

Pivot tables are an easy-to-use tool to derive some basic business intelligence from your data. As discussed last time, there are occasions when you’ll need to do interactive data mining by changing column and row fields. But in my experience, it’s handy to have my favorite reports built automatically, with the reports ready to go as soon as I open the spreadsheet. In this post I’ll develop and explain the code to create a set of pivot tables automatically in worksheet.

The goal of this exercise is to automate the generation of pivot tables and save them to a new Excel file.
![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091123_reports.png)

load newABCDCatering.xls from the previous step and record the macro to create this simple pivot table showing Net Bookings by Sales Rep and Food Name for the last four quarters as below.
![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091123_setup.png)

The captured macro should be similar to this.

```vbscript
'
' Macro2 Macro
'

'
    Selection.CurrentRegion.Select
    Sheets.Add
    ActiveWorkbook.PivotCaches.Create(SourceType:=xlDatabase, SourceData:= _
        "Sheet2!R1C1:R791C13", Version:=xlPivotTableVersion10).CreatePivotTable _
        TableDestination:="Sheet3!R3C1", TableName:="PivotTable1", DefaultVersion _
        :=xlPivotTableVersion10
    Sheets("Sheet3").Select
    Cells(3, 1).Select
    With ActiveSheet.PivotTables("PivotTable1").PivotFields("Fiscal Year")
        .Orientation = xlPageField
        .Position = 1
    End With
    With ActiveSheet.PivotTables("PivotTable1").PivotFields("Fiscal Quarter")
        .Orientation = xlColumnField
        .Position = 1
    End With
    With ActiveSheet.PivotTables("PivotTable1").PivotFields("Sales Rep Name")
        .Orientation = xlRowField
        .Position = 1
    End With
    With ActiveSheet.PivotTables("PivotTable1").PivotFields("Food Name")
        .Orientation = xlRowField
        .Position = 2
    End With
    ActiveSheet.PivotTables("PivotTable1").AddDataField ActiveSheet.PivotTables( _
        "PivotTable1").PivotFields("Net Booking"), "Sum of Net Booking", xlSum
End Sub
```

Looking at the macro, you see lines specifying the Orientation of the field name, such as `.Orientation = xlRowField` and `.Orientation = xlColumnField`. A pivot table has four basic areas for fields:

* Report Filter (.Orientation = xlPageField)
* Column area (.Orientation = xlColumnField)
* Row area (.Orientation = xlRowField)
* Values area (PivotTables().AddDataField())

Each of these supports multiple fields (column fields for `Sales Rep Name` and `Food Name` were added in the example). The ordering of the fields changes the appearance of the table.

A general pattern should be apparent in this macro. First, the pivot table is created with the `ActiveWorkbook.PivotCaches.Create()` statement. Next, the columns and rows are configured with a series of `ActiveSheet.PivotTables("PivotTable1").PivotFields()` statements. Finally, the field used in the Values section of the table is configured using the `ActiveSheet.PivotTables("PivotTable1").AddDataField` statement. The general purpose function will need to contain all of these constructs. Note the parts that can’t be hard-coded: the source of the data, `"Sheet2!R1C1:R791C13"`, and destination for the table, `"Sheet3!R3C1"` need to be determined based on the characteristics of the source data and can’t be hard coded in the general solution.

In Python, this pattern can be reduced to the following loop that covers fields for the Report Filter, Columns and Rows:

In [None]:
def addpivot(wb,sourcedata,title,filters=(),columns=(),rows=(),sumvalue=(),sortfield=""):
    """Build a pivot table using the provided source location data
    and specified fields
    """
    for fieldlist,fieldc in ((filters,win32c.xlPageField),
                            (columns,win32c.xlColumnField),
                            (rows,win32c.xlRowField)):
        for i,val in enumerate(fieldlist):
            wb.ActiveSheet.PivotTables(tname).PivotFields(val).Orientation = fieldc
        wb.ActiveSheet.PivotTables(tname).PivotFields(val).Position = i+1

Processing the Values field is more or less copied from the Visual Basic. To keep things simple in this example, this code is limited to adding “Sum of” values only, and doesn’t handle other Summarize Value functions such as Count, Min, Max, etc.

In [None]:
wb.ActiveSheet.PivotTables(tname).AddDataField(
    wb.ActiveSheet.PivotTables(tname).PivotFields(sumvalue[7:]),
    sumvalue,
    win32c.xlSum)

The actual values for filters, columns and rows in the function are defined in the call to the function. The complete function creates a new sheet within the workbook, then adds an empty pivot table to the sheet and builds the table using the field information provided. For example, to answer the question: What were the total sales in each of the last four quarters?, the pivot table is built with the following call to the addpivot function:

In [None]:
# What were the total sales in each of the last four quarters?
addpivot(wb,src,
         title="Sales by Quarter",
         filters=(),
         columns=(),
         rows=("Fiscal Quarter",),
         sumvalue="Sum of Net Booking",
         sortfield=())

the above code defines a pivot table using the row header “Fiscal Quarter” and data value “Sum of Net Booking”. The title “Sales by Quarter” is used to name the sheet itself.

To make the output spreadsheet more understandable, the title parameter passed into the function and used as a title in each worksheet and as the tab name.

![image.png](https://raw.githubusercontent.com/fjvarasc/DSPXI/master/figures/20091123_titletabsbq.png)

The complete script is shown below. Caveats:

* This script has been modified to run on both Excel 2007 and Excel 2003 and has been tested on those versions.
* Adding pivot tables increases the size of the output Excel file, which can be mitigated by disabling caching of pivot table data. `PivotTables(tname).SaveData = False`, this will reduce the size of the output Excel file, but will require that the pivot table be refreshed before use by clicking on Refresh Data on the PivotTable toolbar.



http://pythonexcels.com/cleaning-up-corporate-erp-data/
http://pythonexcels.com/automating-pivot-tables-with-python/
https://github.com/pythonexcels/examples/blob/master/erppivotextended.py

http://pythonexcels.com/extending-pivot-table-data/
http://pythonexcels.com/a-user-friendly-experience/




more using win32 please check [this](http://timgolden.me.uk/python/win32_how_do_i.html) link