![txtvid](png/txtvid.png)

# Text_to_Video_Edits application

This python file cuts a **video clip** based on **human-written instructions** inside the FinalCut Pro video editor.

See [README.txt](https://github.com/DmytroNorth/Text_To_Video_Edits-fcp/blob/main/README.md) for **information** about the project.

## Table of contents

1. Setting up  
1.1 Importing packages  
1.2 Reading videoedits.txt file  
1.3 Reading clip.fcpxml file

2. Validation and cleaning  
2.1 Timecode validation  
2.2. String cleaning and conversion

3. Dataframe conversion and formatting  
3.1 Dataframe conversion  
3.2 Dataframe formatting to total seconds

4. Dataframe operations  
4.1 Adding calculated columns to a dataframe

5. String assembly  
5.1 Splitting and storing strings in a list  
5.2 Assembling strings with "for loop"

6. XML assembly

7. Open export.fcpxml in FinalCut Pro
  
## 1. Setting up
The input files we are going to need are:
* `.txt` file with edit decisions containing the beginning and ending timecode in the same line
* `.fcpxml` file  which is a proprietary file of Apple FinalCut Pro video editing software.
It is obtained simply by exporting `file -> export XML` existing project from FinalCut Pro.
The project must contain at least one compound clip with at least one cut.
 
### 1.1 Importing packages
We start with importing:
* `pandas` data analysis and manipulation tool for `dataframe` and `timedelta` operations
* `re` module to perform **regular expression** operations

In [1]:
#!/usr/bin/env python
import re
import pandas as pd


### 1.2 Reading videoedits.txt file
Our sample `videoedits.txt` file contains some written instructions, which will be discarded and timecodes in various formats. Timecodes presented in `minutes:seconds` or `hours:minutes:seconds` formats. We will be correcting this soon.

In [2]:
# intializing .txt file with a list of markers
edt = open('videoedits.txt').read()
print(edt)


00:25 Starts speaking here… Ends speaking here 01:02
02:38 Leave this part - repeats 3 times. Ends here 03:7
the beginning is at 04:17 all the way up to 4:31 including the action
this line without timecode will be discarded
this line with a single time code 05:48 will also be discarded
from 1:4:40 to 01:4:48


### 1.3 Reading clip.fcpxml file
Our sample `clip.fcpxml` file contains all the information regarding video clips we are using as well as all the changes made to them inside FinalCut Pro. We will be working with `<ref-clip>` tags. In FinalCut Pro **compound clip** is a clip that contains multiple videos inside. Therefore just by cutting just one clip, we can effectively cut multiple videos that are stored inside it.

In [None]:
# intializing .fcpxml file with at least 1 marker
fcp = open('clip.fcpxml').read()
print('\n'.join(re.findall(r'<ref-clip.*?>', fcp)))

<ref-clip name="ocean_clip" offset="0s" ref="r2" duration="8000/3000s" start="15800/3000s"/>
<ref-clip name="ocean_clip" offset="8000/3000s" ref="r2" duration="8200/3000s" start="5900/3000s"/>


 ## 2. Validation and cleaning
 Now let's use regular expressions with `re` module to format and clean `videoedits.txt` file.
 ### 2.1 Timecode validation
 Here we add `00:` hours to all the timecodes don't have specified hour.

In [None]:
# puting '00:' hours in where hours are missing
pat1 = r'(\s|^)(\d{1,2}:\d{1,2})(\s|$)'
repl1 = "\g<1>00:\\2\\3"
edt1 = re.sub(pat1, repl1, edt, 0, re.MULTILINE)
print(edt1)


00:00:25 Starts speaking here… Ends speaking here 00:01:02
00:02:38 Leave this part - repeats 3 times. Ends here 00:03:7
the beginning is at 00:04:17 all the way up to 00:4:31 including the action
this line without timecode will be discarded
this line with a single time code 00:05:48 will also be discarded
from 1:4:40 to 01:4:48


### 2.2. String cleaning and conversion
Now we are stripping all the text and only leaving instances with two timecodes in the same line. We are also converting the string into a list of tuples, each tuple representing the start and end of the cut.

In [None]:
# pulling instanses with two timecodes in one line
pat2 = r'.*?(\d{1,2}:\d{1,2}:\d{1,2}).*?(\d{1,2}:\d{1,2}:\d{1,2})'
edt2 = re.findall(pat2, edt1)
print(edt2)


[('00:00:25', '00:01:02'), ('00:02:38', '00:03:7'), ('00:04:17', '00:4:31'), ('1:4:40', '01:4:48')]


## 3. Dataframe conversion and formatting
Since we will need to convert to timedelta and perform some calculations let's convert the list into a dataframe with `pandas` and format the values into total seconds.
### 3.1 Dataframe conversion
Here we are converting to a dataframe with two columns and assigning their names.


In [None]:
# converting to dataframe
df = pd.DataFrame(edt2, columns = ['start', 'end'])
print(df)


      start       end
0  00:00:25  00:01:02
1  00:02:38   00:03:7
2  00:04:17   00:4:31
3    1:4:40   01:4:48


### 3.2 Dataframe formatting to total seconds
To convert to total seconds let's create a custom function and apply it to both columns. Custom function `str_to_sec` first converts strings to `timedelta` data type, then calculates total seconds, and changes type to `integer`.


In [None]:
# building custom function to convert string to total seconds
def str_to_sec(colname):
    colname = pd.to_timedelta(colname)
    return colname.dt.total_seconds().astype(int)

# applying custom function to all columns
df[['start', 'end']] = df[['start', 'end']].apply(str_to_sec)
print(df)


   start   end
0     25    62
1    158   187
2    257   271
3   3880  3888


 ## 4. Dataframe operations
 Let's have another look at the `<ref-clip>` tag.
 
 `<ref-clip name="ocean_clip" offset="8000/3000s" ref="r2" duration="8200/3000s" start="5900/3000s"/>`

The values we need to replace are:
* `offset` - the relative position of a clip chunk in the timeline.
* `duration` - the duration of the clip chunk.
* `start` - the beginning timecode of the clip which is the `start` column.
### 4.1 Adding calculated columns to a dataframe
Let's add the `duration` column representing the difference of `end` and `start` columns. And `offset` column by calculating the cumulative sum of `duration` shifted down by one row. Let's also fill `NA` with zeroes and convert all values into the `integer` type.


In [None]:
    # edt3straction with df
df['duration'] = df['end'] - df['start']
#calculating offset column and shifting one row below
df['offset']=df['duration'].cumsum().shift(+1)
# filling NA values with 0 and assigning type integer
df = df.fillna(0).astype(int)
# print(df[['start', 'duration', 'offset']])
print(df.iloc[: , [0, 2, 3]])


   start  duration  offset
0     25        37       0
1    158        29      37
2    257        14      66
3   3880         8      80


## 5. String assembly
Now that we have all the values stored neatly in the dataframe we can parse existing `<ref-clip>` tags and replace them with our values.
### 5.1 Splitting and storing strings in a list
Here we recreate `<ref-clip>` tag in **regex** and with the help of **capture groups** create a **list of tuples**, each tuple contains four strings, needed to assemble the tag.

In [None]:
# pulling chunks of strings from fcpxml to assemble <ref-clip> tag
pat3 = r'(.*?<ref-clip.*?offset=").*?(".*duration=").*?(".*start=").*?(".*)'
fcp1 = re.findall(pat3, fcp)
print(fcp1)


[('                        <ref-clip name="ocean_clip" offset="', '" ref="r2" duration="', '" start="', '"/>'), ('                        <ref-clip name="ocean_clip" offset="', '" ref="r2" duration="', '" start="', '"/>')]


### 5.2 Assembling strings with "for loop"
Let's use the first tuple of the list and cycle through all rows of our dataframe to assemble `<ref-clip>` tags. Let's then convert the resulting list into a string.

In [None]:
# combining created lists
lcomb = []
for i in range(len(df)):
    lcomb.append(fcp1[0][0] + str(df.offset[i]) + fcp1[0][1] + str(df.duration[i])+fcp1[0][2] + str(df.start[i])+ fcp1[0][3])
# print(lcomb)

# converting list into a string
edt3 = '\n'.join(lcomb)
print(edt3)


                        <ref-clip name="ocean_clip" offset="0" ref="r2" duration="37" start="25"/>
                        <ref-clip name="ocean_clip" offset="37" ref="r2" duration="29" start="158"/>
                        <ref-clip name="ocean_clip" offset="66" ref="r2" duration="14" start="257"/>
                        <ref-clip name="ocean_clip" offset="80" ref="r2" duration="8" start="3880"/>


## 6. XML assembly
Now that we have new `<ref-clip>` tags stored in one string we just need to replace the old tags with the new ones and save them as new `.fcpxml` file.

In [None]:
# replacing ref-clip markers with newly assembled
pat4 = r'( +<ref-clip.*?\n)+'
repl4 = edt3 + '\n'
fcp2 = re.sub(pat4, repl4, fcp)

# writing to a new .fcpxml file
with open('export.fcpxml', 'w') as newfile:
    newfile.write(fcp2)
print('\n'.join(re.findall(r'.*?<ref-clip.*?>', fcp2)))

                        <ref-clip name="ocean_clip" offset="0" ref="r2" duration="37" start="25"/>
                        <ref-clip name="ocean_clip" offset="37" ref="r2" duration="29" start="158"/>
                        <ref-clip name="ocean_clip" offset="66" ref="r2" duration="14" start="257"/>
                        <ref-clip name="ocean_clip" offset="80" ref="r2" duration="8" start="3880"/>


## 7. Open export.fcpxml in FinalCut Pro
Now we can open newly created `export.fcpxml` file with FinalCut Pro and the clip will be edited according to the initial text instructions.

![finalcut pro compund clip automatically edited based on text](png/txtvid1.png)