# Finding relevant code changes in frameworks and packages

The goal of this notebook is to track evolving code bases by first extracting changes made via the git log. These can then be filtered for the correct timeframe and relevant functions. The next step would then be to analyse the changes and decide whether they are relevant to a developer that uses that part of the code for differential testing or not.

## Imports

In [24]:
import os
import inspect
import pandas as pd
from datetime import date, timedelta
import sys
import subprocess

import numpy as np
#from scipy import stats

## Setup: User Input

* The user inputs the package that they would like to update and the Deep Learning Library. 
* They then inputs the current version of the package that the DLL is using and the one that they would like to upgrade to (default: most recent version). The version is here simplified to release dates for now, since this is easier to handle for git diff.
* If the Github Link for that package is not stored, they then input the Github Link for that package.


In [22]:
# Input 1: Package name
package_name = 'keras'

# Input 2: Deep Learning Library name and directory
dll_name = 'tensorflow'
dll_directory = 'A:/BachelorThesis/DLL_Testing_Tool/DL_Libraries/Tensorflow/tensorflow-1.12.0/tensorflow/python/'

# Input 3: Current version (i.e. date for simplicity) of the package (and optionally the desired version)
# format: day-month-year hour:minute:second
current_version_date = '01-01-2021 00:00:00' 
desired_version_date = date.today()#.strftime("%d-%m-%Y %H:%M:%S")

# Input 4: Github Link (if not stored by the tool)
git_url = 'https://github.com/keras-team/keras.git'

In [23]:
# Input 1: Package name
package_name = 'tensorflow'

# Input 2: Deep Learning Library name and directory
dll_name = 'tensorflow'
dll_directory = 'A:/BachelorThesis/DLL_Testing_Tool/DL_Libraries/Tensorflow/tensorflow-1.12.0/tensorflow/python/'

# Input 3: Current version (i.e. date for simplicity) of the package (and optionally the desired version)
# format: day-month-year hour:minute:second
current_version_date = date(2021, 1, 1) # '01-01-2021 00:00:00' 
desired_version_date = date.today()#.strftime("%d-%m-%Y %H:%M:%S")

# Input 4: Github Link (if not stored by the tool)
git_url = "https://github.com/tensorflow/tensorflow.git"

In [None]:
# Import the package that should be upgraded (used to find the files where extracted functions are defined)
from tensorflow import keras
#import keras

### Tools internal processing of the inputs

In [55]:
# TODO Check inputs for validity (i.e. does dll directory exist, is date in the correct format, is package known (for git url))

# Setup folder names
clone_folder_name = 'temp_bare_clone_' + package_name

## Create a bare clone of the library, which only includes repository data

In this way, we do not have to download the code, but still get access to the commit log.

In [None]:
# create a temporary directory for a bare clone of a give library
try:
    os.mkdir(clone_folder_name)
except:
    pass

In [None]:
# create the bare clone
!git clone --bare {git_url} {clone_folder_name}

In [5]:
%cd {clone_folder_name}

A:\BachelorThesis\DLL_Testing_Tool\Code\2_Commit_Extraction_and_Analysis\temp_bare_clone_tensorflow


## Import the extraction data 

In [6]:
# import tensorflow 1.12.0 data
df_tensorflow_1_12_0 = pd.read_csv('../../1_Test_Case_Extraction_and_Analysis/extracted_data/tensorflow_1.12.0_data.csv')

## Filter for only functions of the package




In [7]:
package_name = 'keras'
# As a temporary solution, we will filter these for functions that contain 'package_name.' specifically
column_to_filter = 'Differential_Test_Function'
filter_keyword = package_name + '\.'

relevant_test_cases = df_tensorflow_1_12_0[df_tensorflow_1_12_0[column_to_filter].str.contains(filter_keyword, na=False)]
relevant_test_cases

Unnamed: 0,File_Path,Line_Number,Found_in_Function,Function_Definition_Line_Number,Assert_Statement_Type,Oracle_Argument_ Position,Differential_Function_Line_Number,Differential_Test_Function
13418,keras\backend_test.py,55,compare_single_input_op_to_numpy,34,assert_allclose,1,52,keras.backend.eval
13423,keras\backend_test.py,84,compare_single_input_op_to_numpy,34,assert_allclose,1,52,keras.backend.eval
13427,keras\backend_test.py,141,test_learning_phase_scope,135,assertEqual,2,137,keras.backend.learning_phase
13428,keras\backend_test.py,145,test_learning_phase_scope,135,assertEqual,2,137,keras.backend.learning_phase
13429,keras\backend_test.py,149,test_learning_phase_scope,135,assertEqual,2,137,keras.backend.learning_phase
...,...,...,...,...,...,...,...,...
26322,kernel_tests\rnn_test.py,636,testRNNCellSerialization,601,assertAllClose,1,613,keras.models.Model
26323,kernel_tests\rnn_test.py,636,testRNNCellSerialization,601,assertAllClose,1,610,keras.Input
26327,kernel_tests\rnn_test.py,636,testRNNCellSerialization,601,assertAllClose,2,613,keras.models.Model
26328,kernel_tests\rnn_test.py,636,testRNNCellSerialization,601,assertAllClose,2,633,keras.models.Model


## Getting a git diff of the current version of the extracted function and the desired version.

Procedure:
1. For a single extracted function, get the file it is defined in
2. Use git log to extract the commit id of the current version and the desired version
3. Perform a git diff, comparing the extracted file in those two commits
4. Selecting only the parts of the git diff that concern the extracted function

In [54]:
#for case_id in range(len(relevant_test_cases)):
#    try:

# for testing only:
case_id = 1170
current_version_date = date(2021,5,10)

# 1: 

# get extracted function as string
extracted_function = relevant_test_cases.iloc[case_id]['Differential_Test_Function']
print("Extracted function: \t" + extracted_function)

# find where the function is defined
str_to_execute = 'extracted_function_file_location = inspect.getsourcefile({})'.format(extracted_function)

extracted_function_file_location = ''
# execute the string as if it were code
exec(str_to_execute)
print("Full path: \t\t" + extracted_function_file_location)

# get the package root and remove it from the file path. This relative file path is necessary for a git diff
#package_root = ''
#exec('package_root = inspect.getsourcefile({})'.format(package_name))
# remove the init.py part from the path
#package_root = package_root.replace('__init__.py', '')
#print(package_root)

# remove the package root to get the relative file path 
package_root_index = extracted_function_file_location.index('tensorflow')
print("Index " + str(package_root_index) + " cuts off \t" +  extracted_function_file_location[:package_root_index])
extracted_function_file_location = extracted_function_file_location[package_root_index:] #.replace(package_root,'')

print("Function location: \t{} \n".format(extracted_function_file_location))

# 2:
git_log_current_version = ''
days = 1
while git_log_current_version == '':
    command_current_version = ["git", "log", "--since", (current_version_date-timedelta(days=days)).strftime("%d-%m-%Y"), "--until", current_version_date.strftime("%d-%m-%Y"), "--", extracted_function_file_location]
    git_log_current_version = subprocess.run(command_current_version, stdout=subprocess.PIPE).stdout.decode('utf-8')
    print(git_log_current_version)
    days += 1
    
commit_id_current = git_log_current_version[7:].splitlines()[0]
print("Commit id: " + commit_id_current + "\n")

# TODO Check the "tensorflow."+ versions of the files?
#command = ["git", "log", "--oneline", "--name-only", "--since", current_version_date, "--until", desired_version_date, "--", extracted_function_file_location]
#command = ["git", "log", "--oneline", "--since", current_version_date, "--until", desired_version_date, "--", extracted_function_file_location]
#command = ["git", "log", "--oneline", "--since", current_version_date-timedelta(days=1), "--until", current_version_date, "--", extracted_function_file_location]
#command = ["git", "log", "--oneline", "--", extracted_function_file_location]



# 3:

# git diff [<options>] <commit>..<commit> [--] [<path>…]
# Tensorflow test commit ids
commit_id_desired = "3eee56fc4b62f40756277adf231a665ac89d2ab6" 
#commit2 = "7716b719e5418ae91877c4b739771df336bea590" 

# Keras test commit ids
#commit1 =
#commit2 = 
command = ["git", "diff", commit_id_current, commit_id_desired, "--", extracted_function_file_location]

#print(''.join(command) + "\n")

git_log_output = subprocess.run(command, stdout=subprocess.PIPE).stdout.decode('utf-8')
#with open("git_diff_output.txt", "w+", encoding='utf-8') as text_file:
#    text_file.write(git_log_output)
print(git_log_output)

#    except Exception as exc:
#        print("ERROR: {}\n".format(exc))

Extracted function: 	keras.layers.SimpleRNNCell
Full path: 		A:\Programs\Python36\lib\site-packages\tensorflow\python\keras\layers\recurrent.py
Index 39 cuts off 	A:\Programs\Python36\lib\site-packages\
Function location: 	tensorflow\python\keras\layers\recurrent.py 





commit a9cf3a0e4b419630f0183b0cc4e48e0641a62721
Merge: e77d6acaefa 3eee56fc4b6
Author: TensorFlower Gardener <gardener@tensorflow.org>
Date:   Thu May 6 12:14:44 2021 -0700

    Merge pull request #48725 from amogh7joshi:patch-3
    
    PiperOrigin-RevId: 372395334
    Change-Id: Ie8841999976df629318bc10af1a9e822114d552c

Commit id: a9cf3a0e4b419630f0183b0cc4e48e0641a62721

diff --git a/tensorflow/python/keras/layers/recurrent.py b/tensorflow/python/keras/layers/recurrent.py
index d48e685a1cb..f857748b27d 100644
--- a/tensorflow/python/keras/layers/recurrent.py
+++ b/tensorflow/python/keras/layers/recurrent.py
@@ -1314,9 +1314,11 @@ class SimpleRNNCell(DropoutRNNCellMixin, Layer):
                dropout=0.,
        

## Install section (for testing)

In [None]:
!python --version

In [None]:
# install the package (TODO if not already installed)
!{sys.executable} -m pip install {package_name}==2.2.4

In [None]:
!{sys.executable} -m pip install tensorflow

In [None]:
%pip install tensorflow==1.12.0

In [13]:
!{sys.executable} -m pip show keras

Name: Keras
Version: 2.2.4
Summary: Deep Learning for humans
Home-page: https://github.com/keras-team/keras
Author: Francois Chollet
Author-email: francois.chollet@gmail.com
License: MIT
Location: a:\programs\python36\lib\site-packages
Requires: numpy, h5py, keras-preprocessing, pyyaml, six, scipy, keras-applications
Required-by: 


In [None]:
!{sys.executable} --version

In [None]:
sys.executable

## TESTING SECTION:

In [None]:
!start .
#os.system("git log --oneline -- {extracted_function_file_location}")
# --since "20-06-2021 00:00:00" -p

In [None]:
# helper for finding where functions are defined
print(inspect.getsourcefile(stats.t.logpdf) + "\n")
print(inspect.getsource(stats.t.logpdf))

In [None]:
# Different git urls:
#git_url = "https://github.com/pytorch/pytorch.git"
#git_url = "https://github.com/scipy/scipy.git"
#git_url = "https://github.com/keras-team/keras.git"

### Testing git log functions

-p shows the diffs

Hunks of differences are in the format @@ from-file-range to-file-range @@ [header].  
The from-file-range is in the form -\<start line\>,\<number of lines\>, and to-file-range is +\<start line\>,\<number of lines\>

In [None]:
!git log --oneline -- {extracted_function_file_location}

In [None]:
!git log --since="3 hours ago" --pretty=oneline

In [None]:
!git log --name-only --date=local --since "20-06-2021 00:00:00" 

In [None]:
!git log --name-only --oneline --since "20-06-2021 00:00:00"

In [None]:
!git log --name-only --oneline --since "20-06-2021 00:00:00"
#--since "20-06-2021 00:00:00" -p -- scipy/special/_basic.py