# Extract functions from converter

We need to extract

```
torch.log, torch.Tensor.log, torch.log_, torch.Tensor.log_
```

from the converter programmatically.

They will be passed to AI, and it will make inputs of the functions.

In [None]:
@converter(torch.log, torch.Tensor.log, torch.log_, torch.Tensor.log_, channel_ordering_strategy=ChannelOrderingStrategy.MINIMUM_TRANSPOSITIONS)
def converter_log(input: Tensor, *, out: Optional[Tensor]=None):
    def func(input, *, out: Optional[Tensor]=None):
        return tf.math.log(input)
    return func


# Unittest Template


In [None]:
# Function target is assumed

# The target to be converted
func = torch.log

# The function name representing the functions properly. Ex) log, do_something.
# Option1 : Use the name of the first function. (preferred)
# Option2 : Ask AI.
# No matter what we choose, we must refine them.
func_name = "log"

# First Letter and letter after _ are capitalized. Ex) Log, DoSomething.
Func_name = "Log"

# Inputs as a list generated by AI.
# It will be unpacked to be passed.
# Would kwargs be better than args? It would be nice to try both.
# Do we need 'r'? Maybe not, but make it sure that it is special-letter-free.
inputs = r'[torch.ones(1, 10, 20), torch.ones(1, 10, 20)]'

template = f"""
def test_{func_name}_converter(self):
    # Define the Sign model inside the test
    class {Func_name}(torch.nn.Module):
        def __init__(self):
            super({Func_name}, self).__init__()

        def forward(self, input_tensor):
            return {func}(input_tensor)

    # Initialize the model and input tensor
    torch_model = {Func_name}
    torch_model.eval()
    input_tensor = torch.randn(1, 10, 20)

    # Convert the model and ensure the HTML trace is saved
    keras_model = nobuco.pytorch_to_keras(
        torch_model,
        args=[*{inputs}], kwargs=None,
        inputs_channel_order=nobuco.ChannelOrder.TENSORFLOW,
        outputs_channel_order=nobuco.ChannelOrder.TENSORFLOW,
        save_trace_html=True
    )

    # Read the contents of the trace.html file
    with open('trace.html', 'r', encoding='utf-8') as file:
        trace_html = file.read()

    # Assertions for the content of trace_html
    self.assertNotIn('Max diff', trace_html, "The trace HTML should not contain 'Max diff'")
"""

In [None]:
# Class target is assumed

# The target to be converted
constructor = 'torch.nn.Conv2'

# The function name representing the functions properly. Ex) log, do_something.
# Option1 : Use the name of the first function. (preferred)
# Option2 : Ask AI.
# No matter what we choose, we must refine them.
class_name = "log"

# Inputs as a list generated by AI.
# It will be unpacked to be passed.
# Would kwargs be better than args? It would be nice to try both.
inputs = r'[torch.ones(1, 10, 20), torch.ones(1, 10, 20)]'

# Another inputs are required to generate the target instance.
# Try kwargs here.
construction_args = r"{'in_channels':10, 'out_channels':10, 'kernel_size'=1}"

template = f"""
def test_{func_name}_converter(self):
    # Initialize the model directly from its constructor
    torch_model = {constructor}(**{construction_args})
    torch_model.eval()
	# Initialize the model and input tensor
    inputs = {inputs}

    # Convert the model and ensure the HTML trace is saved
    keras_model = nobuco.pytorch_to_keras(
        torch_model,
        args=[*inputs], kwargs=None,
        inputs_channel_order=nobuco.ChannelOrder.TENSORFLOW,
        outputs_channel_order=nobuco.ChannelOrder.TENSORFLOW,
        save_trace_html=True
    )

    # Read the contents of the trace.html file
    with open('trace.html', 'r', encoding='utf-8') as file:
        trace_html = file.read()

    # Assertions for the content of trace_html
    self.assertNotIn('Max diff', trace_html, "The trace HTML should not contain 'Max diff'")
"""

# '__doc__`

The documations can be extracted programmatically.
The information can be very useful for the AI's tasks


## Script

```
import torch

print(torch.nn.Conv2d.__doc__)
```


## Full Output


```
0s
import torch

print(torch.nn.Conv2d.__doc__)
Applies a 2D convolution over an input signal composed of several input
    planes.

    In the simplest case, the output value of the layer with input size
    :math:`(N, C_{\text{in}}, H, W)` and output :math:`(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})`
    can be precisely described as:

    .. math::
        \text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) +
        \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)


    where :math:`\star` is the valid 2D `cross-correlation`_ operator,
    :math:`N` is a batch size, :math:`C` denotes a number of channels,
    :math:`H` is a height of input planes in pixels, and :math:`W` is
    width in pixels.
    

    This module supports :ref:`TensorFloat32<tf32_on_ampere>`.

    On certain ROCm devices, when using float16 inputs this module will use :ref:`different precision<fp16_on_mi200>` for backward.

    * :attr:`stride` controls the stride for the cross-correlation, a single
      number or a tuple.

    * :attr:`padding` controls the amount of padding applied to the input. It
      can be either a string {'valid', 'same'} or an int / a tuple of ints giving the
      amount of implicit padding applied on both sides.

    * :attr:`dilation` controls the spacing between the kernel points; also
      known as the à trous algorithm. It is harder to describe, but this `link`_
      has a nice visualization of what :attr:`dilation` does.

    * :attr:`groups` controls the connections between inputs and outputs.
      :attr:`in_channels` and :attr:`out_channels` must both be divisible by
      :attr:`groups`. For example,

        * At groups=1, all inputs are convolved to all outputs.
        * At groups=2, the operation becomes equivalent to having two conv
          layers side by side, each seeing half the input channels
          and producing half the output channels, and both subsequently
          concatenated.
        * At groups= :attr:`in_channels`, each input channel is convolved with
          its own set of filters (of size
          :math:`\frac{\text{out\_channels}}{\text{in\_channels}}`).

    The parameters :attr:`kernel_size`, :attr:`stride`, :attr:`padding`, :attr:`dilation` can either be:

        - a single ``int`` -- in which case the same value is used for the height and width dimension
        - a ``tuple`` of two ints -- in which case, the first `int` is used for the height dimension,
          and the second `int` for the width dimension

    Note:
        When `groups == in_channels` and `out_channels == K * in_channels`,
        where `K` is a positive integer, this operation is also known as a "depthwise convolution".

        In other words, for an input of size :math:`(N, C_{in}, L_{in})`,
        a depthwise convolution with a depthwise multiplier `K` can be performed with the arguments
        :math:`(C_\text{in}=C_\text{in}, C_\text{out}=C_\text{in} \times \text{K}, ..., \text{groups}=C_\text{in})`.

    Note:
        In some circumstances when given tensors on a CUDA device and using CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting ``torch.backends.cudnn.deterministic = True``. See :doc:`/notes/randomness` for more information.

    Note:
        ``padding='valid'`` is the same as no padding. ``padding='same'`` pads
        the input so the output has the shape as the input. However, this mode
        doesn't support any stride values other than 1.

    Note:
        This module supports complex data types i.e. ``complex32, complex64, complex128``.

    Args:
        in_channels (int): Number of channels in the input image
        out_channels (int): Number of channels produced by the convolution
        kernel_size (int or tuple): Size of the convolving kernel
        stride (int or tuple, optional): Stride of the convolution. Default: 1
        padding (int, tuple or str, optional): Padding added to all four sides of
            the input. Default: 0
        padding_mode (str, optional): ``'zeros'``, ``'reflect'``,
            ``'replicate'`` or ``'circular'``. Default: ``'zeros'``
        dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
        groups (int, optional): Number of blocked connections from input
            channels to output channels. Default: 1
        bias (bool, optional): If ``True``, adds a learnable bias to the
            output. Default: ``True``
    

    Shape:
        - Input: :math:`(N, C_{in}, H_{in}, W_{in})` or :math:`(C_{in}, H_{in}, W_{in})`
        - Output: :math:`(N, C_{out}, H_{out}, W_{out})` or :math:`(C_{out}, H_{out}, W_{out})`, where

          .. math::
              H_{out} = \left\lfloor\frac{H_{in}  + 2 \times \text{padding}[0] - \text{dilation}[0]
                        \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor

          .. math::
              W_{out} = \left\lfloor\frac{W_{in}  + 2 \times \text{padding}[1] - \text{dilation}[1]
                        \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor

    Attributes:
        weight (Tensor): the learnable weights of the module of shape
            :math:`(\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}},`
            :math:`\text{kernel\_size[0]}, \text{kernel\_size[1]})`.
            The values of these weights are sampled from
            :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
            :math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`
        bias (Tensor):   the learnable bias of the module of shape
            (out_channels). If :attr:`bias` is ``True``,
            then the values of these weights are
            sampled from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
            :math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`

    Examples:

        >>> # With square kernels and equal stride
        >>> m = nn.Conv2d(16, 33, 3, stride=2)
        >>> # non-square kernels and unequal stride and with padding
        >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
        >>> # non-square kernels and unequal stride and with padding and dilation
        >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
        >>> input = torch.randn(20, 16, 50, 100)
        >>> output = m(input)

    .. _cross-correlation:
        https://en.wikipedia.org/wiki/Cross-correlation

    .. _link:
        https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md
```

## Interesting Parts

- Args

```
    Args:
        in_channels (int): Number of channels in the input image
        out_channels (int): Number of channels produced by the convolution
        kernel_size (int or tuple): Size of the convolving kernel
        stride (int or tuple, optional): Stride of the convolution. Default: 1
        padding (int, tuple or str, optional): Padding added to all four sides of
            the input. Default: 0
        padding_mode (str, optional): ``'zeros'``, ``'reflect'``,
            ``'replicate'`` or ``'circular'``. Default: ``'zeros'``
        dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
        groups (int, optional): Number of blocked connections from input
            channels to output channels. Default: 1
        bias (bool, optional): If ``True``, adds a learnable bias to the
            output. Default: ``True``
```

- Shape

```
    Shape:
        - Input: :math:`(N, C_{in}, H_{in}, W_{in})` or :math:`(C_{in}, H_{in}, W_{in})`
        - Output: :math:`(N, C_{out}, H_{out}, W_{out})` or :math:`(C_{out}, H_{out}, W_{out})`, where

          .. math::
              H_{out} = \left\lfloor\frac{H_{in}  + 2 \times \text{padding}[0] - \text{dilation}[0]
                        \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor

          .. math::
              W_{out} = \left\lfloor\frac{W_{in}  + 2 \times \text{padding}[1] - \text{dilation}[1]
                        \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor
```

- Examples

```
    Examples:

        >>> # With square kernels and equal stride
        >>> m = nn.Conv2d(16, 33, 3, stride=2)
        >>> # non-square kernels and unequal stride and with padding
        >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
        >>> # non-square kernels and unequal stride and with padding and dilation
        >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
        >>> input = torch.randn(20, 16, 50, 100)
        >>> output = m(input)
```

# Can it be fully deteministric? (no AI intervention)

`__doc__` function provides the plenty of the information. Can we implement the automation fully pure programming?

It would probably not possible because we are not sure if all the __doc__ has the same form.

Do we still conduct some investigation on the feasibility?

# Do we need RAG(Retrieval Augmented Generation)?

- If we use chatgpt, it has the functionality in it. However, if the __doc__ is too long, we must reduce the length first.
- If we use local LLM, it is a pure model generating outputs directly from the prompts
- RAG might increase the performance, or might not. We can sort the necessary information, such as, args, shape and examples, and they are already very powerful. However, we don't know if some layers or functions have other important information.
- Conclusion : Not necessary, so we implement without RAG. After the initial implementation is done, we can add RAG for further improvement.

In [None]:
# Phase1

# Prompt Template Blue Print
# Not just the exact target, but the context
# We'd better start with the smallest example

doc = 'args, shape, example from __doc__'
target = 'torch.nn.Conv2d'

example = f'''
target: torch.nn.Conv2d

output:
torch_model = torch.nn.Conv2d(in_channels=1, out_channels=2, kernel_size=3)
'''

prompt = f'''

<<Documentation>>
{doc}
<</Documentation>>

<<Example>>
{example}
<</Example>>

Based on the context, generate the output.

target: {target}

output:
torch_model = {target}

'''


In [None]:
'''
```
output:
torch model = {target}
```

Not sure with this part.
Some examples use the stretage, but I have a negative experience with the scheme.

Lets try different forms.

1.
output:
torch_model = {target}

2. 
output:

3.
output:
torch_model = 

It might not matter.  Whatever AI generates, we can extract the text after the target programmatically.
'''

In [None]:
# Phase 2

# Check if an instance is initiated by the generated text
# If the test fails, return to Phase1
# If passes, do the next

# Phase1

# Prompt Template Blue Print
# Not just the exact target, but the context
# We'd better start with the smallest example

doc = 'args, shape, example from __doc__'
target = 'torch.nn.Conv2d'

example = f'''
target: 
torch_model = torch.nn.Conv2d(in_channels=1, out_channels=2, kernel_size=3)

output:
output = torch_model.forward(inputs)

'''

prompt = f'''

<<Documentation>>
{doc}
<</Documentation>>

<<Example>>
{example}
<</Example>>

Based on the context, generate the output.

target: {target}

output:


'''